From Demos to Infrastructure: Why AI Agents Need Governance in Production (feat. Logan Kelly)
As AI agents move from demos into production, their probabilistic behavior introduces cost, security, and compliance risks. Observability alone isn’t enough.
In this conversation, Krish Palaniappan and Logan Kelly, CEO of Waxell AI, discuss the evolving landscape of AI agents, focusing on the importance of governance and orchestration in managing these technologies. They explore the challenges and risks associated with deploying AI agents in production, the onboarding process for governance platforms, and the technological advancements that are shaping the future of enterprise software. The discussion highlights the need for effective governance policies to mitigate risks and ensure safe operations of AI agents in various business contexts.
AI agents are no longer confined to controlled demos. They are sending outreach emails, updating CRM records, retrieving internal documents through RAG systems, writing code, and orchestrating multi-step workflows. In many organizations, they are already interacting with production data and external users. What once felt experimental is now operational.
As companies deploy agents at scale, one uncomfortable truth is becoming clear: most teams have observability for software, but very few have governance for agents. That gap matters more than most people realize.
Podcast
The Governance Gap in Modern AI Systems — on Apple and Spotify.
Introduction: Agents Are Already in Production
AI agents are no longer experimental side projects. They are sending emails, updating CRM records, retrieving internal data, writing code, and triggering workflows across production systems. What started as demos has quietly become operational infrastructure.
As this shift accelerates, one gap is becoming increasingly visible: most companies have observability for software, but very few have governance for agents. That distinction is subtle but critical.
Agents Are Fundamentally Different from Traditional Software
Traditional software is largely deterministic. Given the same inputs and logic, it produces the same outputs. Once thoroughly tested, behavior is predictable within defined constraints.
AI agents behave differently. Large language models are probabilistic. Even with identical prompts and context, outputs can vary. When agents incorporate retrieval systems, tool calling, memory, and multi-step reasoning, variability compounds. Small deviations can cascade across workflows.
In a demo, this feels like flexibility. In production, it introduces operational risk.
Observability vs. Governance
Engineering teams are comfortable with observability. They monitor logs, latency, error rates, and token usage. Observability answers a retrospective question: what happened?
Governance answers a forward-looking one: should this be allowed to happen?
An agent exceeding its token budget is observable. An agent inserting sensitive data into an external API call is observable. But unless there is a policy engine capable of intervening, the system continues executing.
Observability reports. Governance enforces.
What Breaks When Agents Scale
When agents move into real workflows, failure modes change.
Costs can spike due to unexpected loops or aggressive tool usage. Sensitive data can leak through prompts in retrieval-augmented systems. Agents acting on behalf of users may overstep access boundaries if identity controls are weak. Some actions, such as sending external communications or modifying financial records, cannot be reversed.
Unlike deterministic systems, where errors are usually isolated, probabilistic systems can amplify mistakes.
The Deterministic Guardrail Principle
A practical architectural principle emerges: anything that can be deterministic should not depend on an LLM.
Budget thresholds, access control rules, scheduling logic, and validation checks should remain deterministic. LLMs should handle reasoning where variability is valuable.
Blending probabilistic reasoning with deterministic safeguards reduces risk without sacrificing capability.
Governance Across the Agent Lifecycle
Effective governance operates in three phases.
Before execution, systems verify authorization, configuration, and budget limits. During execution, policies monitor tool calls, data flows, and token usage. After execution, outputs and anomalies are reviewed, and alerts are triggered if necessary.
Governance mechanisms may warn, block, or redact depending on severity. The key is that intervention happens in real time, not after damage occurs.
From Single Agents to Agent Fleets
Most organizations do not run a single agent. They run fleets: sales agents, support agents, coding agents, finance agents. Policies must scale accordingly.
Some governance rules apply globally, such as maximum daily spend. Others are scoped to specific agent types, user groups, or operational tiers. This moves governance from purely engineering ownership to shared operational responsibility.
As agents become infrastructure, governance becomes an operational discipline.
Infrastructure Is Changing
Agent-based systems differ from traditional SaaS systems. Instead of being activated only by human interaction, agents may run continuously, trigger other agents, and execute asynchronous tasks.
This requires stronger telemetry, better traceability, event-driven architectures, and fine-grained identity management. Agents are no longer just application features; they are long-running system components.
The Enterprise Software Question
There is growing concern that AI agents will displace traditional enterprise software. A more realistic outcome is interface evolution rather than elimination.
Instead of humans navigating dashboards, agents will increasingly interact with enterprise platforms programmatically. The core systems—CRM, finance, collaboration—remain essential. What changes is how they are consumed.
The Near-Term Risk
The most significant risk today is overconfidence. Many agents that perform well in controlled environments are deployed into production as if they are hardened systems.
In hindsight, some of today’s production agents may be viewed as experiments treated as infrastructure.
Without governance, autonomy scales risk faster than it scales value.
Conclusion: Governance Is Not Optional
AI agents introduce probabilistic behavior into operational systems. They are cost amplifiers, security surfaces, and autonomous decision-makers.
Observability tells you what happened. Governance determines what is allowed to happen.
As agents move from demos to infrastructure, governance shifts from a nice-to-have feature to a foundational requirement.

