Earlier this year, the analyst firm Forrester revealed its list of the top 10 emerging technologies of 2024, and several of the technologies on the list related to AI agents – models that don’t just generate information but can perform complex tasks, make decisions and act autonomously.
“Earlier AIs that could go do things were narrow and constrained to a particular environment, using things like reinforcement learning. What we’re seeing today is taking the capabilities of large language models to break those instructions into specific steps and then go execute those steps with different tools,” Brian Hopkins, VP of the Emerging Tech Portfolio at Forrester, said during an episode of our podcast, “What the Dev?”
When it comes to software development, generative AI has commonly been used to help generate code or assist in code completions, saving developers time. Agentic AI will help developers even further by assisting them with more tasks throughout the software development life cycle, such as brainstorming, planning, building, testing, running code, and implementing fixes, explained Shuyin Zhao, VP of product at GitHub.
“Agents serve as an additional partner for developers, taking care of mundane and repetitive tasks and freeing developers to focus on higher-level thinking. At GitHub, we think of AI agents as being a lot like LEGOs – the building blocks that help develop more advanced systems and change the software development process for the better,” Zhao explained.
An example of an AI agent for software development is IBM’s recently released series of agents that can automatically resolve GitHub issues, freeing up developers to work on other things instead of getting stuck fixing their backlog of bugs. The IBM SWE-Agent suite includes a localization agent that finds the file and line of code causing the issue, an agent that edits lines of code based on developer requests, and an agent that can develop and execute tests.
Other examples of AI agents in software development include Devin and GitHub Copilot agents, and it’s been reported that OpenAI and Google are both working on developing their own agents too.
While this technology is still relatively new, Gartner recently predicted that 33% of enterprise software will contain agentic AI capabilities by 2028 (compared to under 1% in 2024), and these capabilities will allow 15% of day-to-day decisions to be made autonomously.
“By giving artificial intelligence agency, organizations can increase the number of automatable tasks and workflows. Software developers are likely to be some of the first affected, as existing AI coding assistants gain maturity,” Gartner wrote in its prediction.
Specialization and multi-agent architectures
Current LLMs like GPT-4o or Claude are “jacks-of-all-trades, masters of none,” meaning that they do a wide range of tasks satisfactorily, from writing poetry to generating code to solving math problems, explained Ruchir Puri, chief scientist at IBM. AI agents, on the other hand, need to be trained to do a particular task, using a particular tool. “This tool is certified for doing that manual process today, and if I’m going to introduce an agent, it should use that tool,” he said.
Given that each agent is highly specialized, the question then becomes, how do you get many of them to work together to tackle complex problems? According to Zhao, the answer is a multi-agent architecture, which is a network of many of these specialized agents that interact with each other and collaborate on a larger goal. Because each agent is highly specialized to a particular task, together they are collectively able to solve more complex problems, she said.
“At GitHub, our Copilot Workspace platform uses a multi-agent architecture to help developers go from idea to code entirely in natural language. In simple terms, they’re a combination of specialized agents that, when combined, can help developers solve complex problems more efficiently and effectively,” Zhao explained as an example.
Puri believes that implementing a multi-agent system is not very different from how a human team comes together to solve complex problems.
“You have somebody who is a software engineer, somebody who’s an SRE, somebody who does something else,” Puri explained. “That is the way we humans have learned to do complex tasks, with a mixture of skills and people who are experts in different areas. That is how I foresee these agents evolving as well, as we continue forward with multi-agent coordination and multi-agent complex behavior.”
One might think that given the reputation of generative AI to hallucinate, increasing the number of agents working together might possibly increase the impact of hallucinations because as the number of decisions being made goes up, the potential for a wrong decision to be made at some point in the chain also goes up. However, there are ways to mitigate this, according to Loris Degionnai, CTO and founder of Sysdig, a security company that has developed its own AI agents for security.
“There are structures and layers that we can put together to increase accuracy and decrease mistakes, especially when these mistakes are important and critical,” he said. “Agentic AI can be structured so that there’s different layers of LLMs, and some of these layers are there, essentially, to provide validation.”
He also explained that, again, the safeguards for multi-agent architectures might mimic the safeguards a team of humans has. For instance, in a security operations center, there are entry-level workers who are less skilled, but who can surface suspicious things to a second tier of more experienced workers who can make the distinction between things that need to be investigated further and those that can be safely disregarded.
“In software development, or even in cybersecurity, there are tiers, there are layers of redundancy when you have people doing this kind of stuff, so that one person can check what the prior person has done,” Degionnai said.
AI agents are still building trust with developers
Just as there was skepticism into how well generative AI could write code, there will also likely be a period where AI agents will need to earn trust before they are sent off to make decisions on their own, without human input. According to Puri, people will probably need to see a very consistent output from agents for a long period of time before they’re entirely comfortable with this.
He likened it to the trust you place in your car every day. You get in every morning and it takes you from point A to point B, and even though the average person doesn’t know how the internal combustion engine works, they do trust it to work and to get them to their destination safely. And, if it doesn’t work, they know who to take it to to get it to work again.
“You put your life or your family’s life in that car, and you say it should work,” Puri said. “And that, to me, is the level of trust you need to get in these technologies, and that is the journey you are on. But you are at the beginning of the journey.”
Challenges that need to be solved before implementation
In addition to building trust, there are still a number of other challenges that need to be addressed. One is that AI agents need to be augmented with enterprise data, and that data needs to be up-to-date and accurate, explained Ronan Schwartz, CEO of the data company K2view.
“Access to this information, the critical backbone of the organization, is really at the core of making any AI work,” said Schwartz.
Cost is another issue, as every query is an expense, and the costs can get even higher when working on a large dataset because of the compute and processing required.
Similarly, the speed and interactivity of an agent is important. It’s not really acceptable to be waiting two hours for a query to be answered, so lower latency is needed, Schwartz explained.
Data privacy and security also need to be considered, especially when a system contains multiple agents interacting with each other. It’s important to ensure that one agent isn’t sharing information that another isn’t supposed to have access to, he said.
“Be very, very thoughtful when evaluating tools and only deploy tools from vendors that are clearly prioritizing privacy and security,” said GitHub’s Zhao. “There should be clear documentation explaining exactly how a vendor is processing your company’s data in order to provide the service, what security measures they have in place–including filters for known vulnerabilities, harmful content, etc. If you can’t find this information clearly documented, that’s a red flag.”
And finally, AI agents need to be reliable since they are acting on someone else’s behalf. If the data they are operating on isn’t reliable, then “that can create a whole chain of action that is not necessary, or the wrong set of actions,” Schwartz explained.
Predictions for what’s to come
Jamil Valliani, head of AI product at Atlassian, believes that 2025 will be the year of the AI agent. “Agents are already quite good at augmenting and accelerating our work — in the next year, they will get even better at performing highly specific tasks, taking specialized actions, and integrating across products, all with humans in the loop,” he said. “I’m most excited to see agents becoming exponentially more sophisticated in how they can collaborate with teams to handle complex tasks.”
He believes that AI agents are benefiting from the fact that foundation models are evolving and are now able to reason over increasingly rich datasets. These advancements will not only improve the accuracy of agents, but also allow them to continuously learn from experiences, much like a human teammate might.
“Our relationship with them will evolve, and we’ll see new forms of collaboration and communication on teams develop,” he said.
Steve Lucas, the CEO of Boomi, predicts that within the next three years, AI agents will outnumber humans. This doesn’t mean that agents will necessarily eliminate human jobs, because as the number of agents increases, so does the need for human oversight and maintenance.
“In this evolution, transparent protocols and governance are important for AI success and will become more significant as agents become embedded in the future of work,” he said.
K2view’s Schwartz agrees that the future workplace is not one in which agents do everything, but rather a place where humans and agents work alongside each other.
“I think sometimes people make a mistake in thinking that the humans will trigger the agent and the agent will do the work. I think the world will be more of a balanced one where agents also trigger humans to do certain work,” he said.