I run AI agents in my own companies every day — research agents, content pipelines, multi-step workflows across several frontier models. I watch what they do well and I watch where they break, and I redesign around both weekly. This article is the read I would give a fellow operator over coffee: no vendor gloss, current as of June 2026, with the survey numbers quoted honestly — including their sample frames, because most of the statistics you see quoted are enterprise surveys being passed off as facts about all businesses.
What agents genuinely do well today: bounded knowledge work with clear success criteria. Research and synthesis across large document sets. First drafts of almost anything — reports, analyses, communications, code. Data extraction, reconciliation, and triage. Multi-step workflows where the steps can be specified and the output can be checked. The current frontier models ship million-token context windows, which in practice means an agent can hold an entire year of reports, a full codebase, or a complete policy manual in view at once. That single capability change makes whole task classes viable that were not viable eighteen months ago.
What is still not solid: agents operating loosely specified, open-ended work, and agents driving desktop software the way a person does. The computer-use agents — the ones that point, click, and navigate a screen — remain research previews or early releases at the major labs, not mature products. I have tested those loops in my own operation, and they are not reliable enough yet to carry a process that matters. My rule: never build a load-bearing workflow on a research preview. The capability will mature; your process should adopt it when it does, not before.
Questions About AI Workforce Optimization?
Chat with Clara to get personalized answers about how these concepts apply to your organization.
And what agents cannot do at all: own judgment. An agent does not carry your context, your relationships, or your accountability. It will execute a flawed instruction flawlessly. It will not push back on a bad plan unless you build the pushback in. Every reliable agent workflow I run has a human decision at the points where being wrong is expensive — and deciding where those points are is, I would argue, the actual skill of the agentic era.
Now the market evidence, with frames attached. As of McKinsey's mid-2025 State of AI fieldwork (a self-selected sample of roughly 2,000 executives, skewed large), 62% of respondents said their organizations were at least experimenting with AI agents — but in any given business function, no more than 10% said they were scaling them. Dynatrace's January 2026 survey of 919 senior leaders at enterprises above $100 million in revenue — a vendor survey, read it that way — found about half of agentic projects still in proof-of-concept or pilot, while 74% expected agentic budgets to rise regardless. Broad experimentation, thin scaling, money still flowing: that is the shape of this market.
The value evidence is genuinely contested, and I will give you both sides rather than resolve it for you. MIT NANDA's July 2025 report — non-peer-reviewed, methodology criticized — reported that roughly 95% of custom enterprise GenAI pilots showed little or no measurable P&L impact within about six months. PwC's May 2025 AI Agent Survey (308 US senior executives, self-reported) found 79% saying agents were already being adopted at their companies and 66% of adopters reporting measurable productivity value. My operator's read of the contradiction: individual productivity gains are real and broad — I experience them daily — but P&L-level impact requires redesigning how work flows through the organization, and almost nobody has done that part. The tools are ahead of the org charts.
So which tasks actually move first? From my own daily use, the first wave is: scheduled reporting and summarization; first-draft production of documents and code; classification and triage of inbound anything; data reconciliation across systems; monitoring with structured escalation. The common thread — high volume, clear specification, checkable output, low blast radius when wrong.
And the tasks that should not move yet: exception handling without precedent, negotiation and relationship work, anything where the decision is the product, and any task where an error is expensive and hard to detect. Note that these split task by task, not job by job. Inside a single analyst role, the weekly report moves and the judgment call on the anomaly stays. This is why role-level analyses — "X% of jobs will be automated" — mislead. The task is the unit of analysis. Map at the task level or you will get the answer wrong in both directions.
The gating capability for all of it is governance, and the data says it is most organizations' weakest muscle. Deloitte's 2026 State of AI in the Enterprise survey (3,235 business and IT leaders, director level and up) found only 21% reporting a mature governance model for agentic AI — meaning roughly four in five lack agent decision boundaries, real-time monitoring, and audit trails. Microsoft's 2026 Work Trend Index — a platform vendor's framing, weigh accordingly — describes the needed posture as treating agents like managed entities: identities, permissions, policy enforcement, lifecycle management, monitoring, auditability. In my own companies, every agent runs with explicit boundaries and verification of its work. Not because the tools demand it, but because Gartner's June 2025 forecast — over 40% of agentic AI projects canceled by end of 2027, on cost, unclear value, and inadequate risk controls — describes exactly what happens to deployments that skip it. That is a forecast, not a fact. I would still plan as if it lands.
Free Guide: Is Your Workforce Ready for AI?
Download our assessment checklist to evaluate your organization's AI readiness.
A note on how to read capability claims generally, because the marketing volume in this category is deafening. I do not trust vendor marketing by default — I treat it as a trigger to go find practitioners who are actually using the tool and reporting honestly, edge cases included. Updates ship literally every day in this space. Any static capability claim, including this article, has a shelf life measured in months. The discipline that does not expire: run your own evaluations, on your own work, against your own bar, and re-run them as the models move.
What this means if you run a 500-to-2,500-employee organization: do not start with "which platform should we buy?" Start with "which of our tasks are in the first wave?" That question is answerable with a structured look at how work actually flows through your teams — and the answer is different for every company, which is why generic adoption playbooks underdeliver. Map the tasks, segment what moves, size the hours, govern what you deploy, and measure against a called shot.
That sequence is precisely what an agentic workforce assessment does — five weeks, published pricing that starts at $25,000, run by someone who uses these tools for real every day rather than surveying people who do. If this article's read on capability matches what you are seeing in your own experiments, the assessment is how you convert that read into a map of your own organization.