Mustafa Suleyman’s Modern Test for AI Capability
Microsoft AI CEO Mustafa Suleyman has also come up with an interesting benchmark that will measure the actual progress made in AI agents, leaving the theoretical behind. Suleyman, being the co-founder of DeepMind, is actually heading the AI division of Microsoft, has put forth what is known as Artificial Capable Intelligence, or ACI, which will measure the actual capability of AI to plan, reason, come to decisions, and work in accordance with legal and financial systems, unlike the imitation of human reactions. Challenge: Can the AI take $100,000 to $1 million according to the legal system, which will need some level of strategic thinking, planning, flexibility, and working within the rules. Suleyman calls this the “modern Turing Test.”
He believes that this is a better measure of intelligence compared to current test methods, which only focus on language abilities, pattern recognition, and so on. According to Suleyman, a highly functional world is a key determinant that a company is reaching a state of artificial general intelligence, commonly referred to as AGI.
The Wider Industry Debate Regarding AI Agents
Suleyman’s timing seems to coincide with the growing interest within the industry for agents and artificial intelligence. Key figures, including the likes of Sam Altman, the head of OpenAI, and Salesforce’s Marc Benioff, have expressed their passion for AI agents, calling them the next big thing within the world of enterprise software. They dream of the eventual development of agents capable of handling difficult tasks, including those of the human brain, including cognition. Sam Altman believes the agents will soon be capable of finding knowledge and handling tough tasks, much like a junior worker maturing with experience.
In spite of this, however, not all artificial intelligence experts can be said to share this same optimism. Certain AI experts believe that the state of autonomy in the systems being built and utilized in the present day is, in fact, immature and bug-prone. Such a view appears to require a more tempered understanding of the state of autonomy in the present day.
Suleyman’s ACI benchmark, therefore, serves not just as a test, but as a provocative conversation starter: what would meaningful progress in the field of AI look like? And by centering it on financial and legal autonomy, he challenges the industry to get beyond hype and focus on real-world impact and responsibility.




