Autonomous Autoreserach

Auto Research is a technique in the AI industry wherein you let coding agents autonomously edit the code for training an LLM. The process runs iteratively and agents make direct code changes to anything regarding the code in the training pipeline, from parameter selection to data and more. I’m particularly interested in generalizing this to everything that the human race does. We can start by thinking about the work that humans do, and this would be the first area in which auto research could be applied. To generalize, it would be to define a few things: the environment in which the agents live in, the tools those agents have, the goals those agents have, and the state of the world that they are manipulating. Next, we can consider the environment that these agents are manipulating, the environment which contains that state, because in fact the agents likely live in a higher dimensional state space.

We’ll start with a few examples, namely coding and the browser. In the coding environment for auto research, the agent is updating a piece of software. That software has some state at any point in time and it completes some task or provides some service. The measurements and quality of anything that service relates to is what the agent is ultimately manipulating.The agent lives in a higher dimension, which is your computer or your terminal. In that computer, it can basically take any action, write any code, change, measure anything it newly needs at any point in time to meaningfully improve the piece of software that it is interacting with and auto-researching. With all of these measurements, we create a loop where that agent optimizes the set of metrics it’s interested in, that the user, the human or the agent, deems as being important.

In the browser example, it’s quite similar. The agent has access to controlling a browser and fulfilling the goal of the user to a task, whether to research, extract, or cause state-changing activities in a web browser.The things that an agent would need to properly auto-research and improve its ability, or the ability of a browser agent, to the task are things like the cost of each run, the amount of context, and inputs pushed through the browser agent and the output quality, success or failure, of its ability to complete the original taskor to meet the intent of the prompt. In the browser agent case (at least how I would think about it), a coding agent is optimizing and auto-researching a browser agent. In this world, the agent is controlling or updating another agent. This agent auto-researching another agent pattern is what we ultimately want to generalize as well. In this model, one agent, the coding agent, has the ability to influence the build-out optimization measurements of another agent. That could be generalized into an arbitrary task with some success/failure criteria.

A large amount of human labor is abstracted through the invention of different machines, and soon it will be replaced by robots who can perform tasks at a human level.This will allow the ability to autoresearch human labor tasks, manual labor tasks, when we have the infrastructure and capabilities to integrate robots into these types of loops if they aren’t already integrated into them.What this will still look like, using our previous case study, is that coding agents will auto research Robotic Agentsif a coding agent, and by coding agent what I really mean is a computer, an agent in a computer with access to any ability it can create through software or any tool that’s readily available or open source, and any data that’s accessible on the internet, either publicly accessible or over a protected API that the agent has access toa coding agent in this framing, a computer agent, more like it, can reason about and build software to improve reasoning about how robots are operating.If we connect the autoresearch loop with the physical output of the robot in a way that is measurable, such as the ability for the robot to have successfully completed the task or failed, such as moving a box from one point to another or operating a machine and building a part in a manufacturing facility. If we can then assess the quality of these outputs both qualitatively and quantitatively, and we feed these outputs back through our auto research loop that our computer agent has access to, then the computer agent, given enough time, will optimally search through this space to reprogram the robot to improve and prompt engineer and train different sub systems that learn properly from tasks, potentially to do few-shot learning. If the robot has this type of auto research system built into it that is actively running, that is exploring things itself, generating million times more data than potentially it is experiencing or ingesting through auto research optimization pipelines, perhaps this will create actual innovation in robotics.

Comments