Beyond the Chatbot: Orchestrating "Thinking" State with A2A and Gemini Interactions
The era of the "Stateless Wrapper" is ending, and the current generation of agent frameworks isn't ready for what comes next.
Tools like LangChain or CrewAI have focused heavily on orchestrating prompts—wrapping the "Goldfish Model" in layers of abstraction to simulate memory and planning. But they largely rely on the client (or a fragile Python script) to be the "brain" that holds the state.
This architecture collapses in the face of Reasoning Models (like Gemini 3). When an AI takes 10 minutes to "think," plan, and execute deep research, you cannot rely on an open HTTP connection or a transient python process to hold the line. The state needs to move from the client to the infrastructure.
We need a shift from Stateless Transactions to Managed State.
This post explores a new architectural pattern: combining the server-side session management of the Gemini Interactions API with the distributed orchestration of the A2A (Agent-to-Agent) Protocol. Together, they allow us to treat an AI "thought process" as a durable, addressable, and portable resource.
Note: This post is based on the reference implementation in the
a2a-experimentsrepository. All examples below can be run locally using the project's CLI.
The Evolution: From "Chat" to "Interactions"
Traditional GenAI SDKs are designed for transactions. You send context; you get text.
The Gemini Interactions API represents a fundamental shift. It treats the conversation as a Resource (Interaction), not a payload.
- Server-Side State: The history lives in the cloud. You pass a
previous_interaction_id, not a 100k-token transcript. - Thinking as a Process: With
background=true, the API acknowledges that "reasoning" is a temporal activity. It decouples the request from the result.
But while the Interactions API solves the storage of state, it doesn't solve the ownership and observability of that state across a distributed system. An InteractionID is just a string. It has no owner, no "working" status visible to the outside world, and no standard way to be shared.
A2A: The Operating System for State
This is where A2A comes in. If the Interactions API is the "Process" (the CPU and RAM), A2A is the "Window Manager."
A2A wraps the raw Interaction in a Task. This simple wrapper adds the semantic layer that raw APIs lack:
- Lifecycle: A2A defines
working,completed, andfailedstates that are broadcast to any observer. - Persistence: The A2A
TaskIDis a permanent handle. Even if the Interactions API retention policy expires, the A2A Task remains as the record of what happened. - Discovery: A2A allows clients to negotiate how to talk to this stateful entity before sending the first byte.
Pattern 1: The Resilient Session
The most immediate benefit is resilience. In a standard setup, if a mobile client disconnects during a 5-minute "Deep Research" step, the response is lost.
By wrapping the Interaction in an A2A Task, we decouple the Observer from the Worker.

This isn't just theory. In the a2a-experiments CLI, you can see this "detach/reattach" flow in action with the ai_researcher skill.
Example: The Asynchronous Workflow
- Initiation: You kick off a research task. The server starts the Interaction and immediately returns a Task ID.
./bin/client invoke "Research the history of the Go language" --skill ai_researcher # Output: Task Started (ID: 0194...) - Status: Working

-
Disconnect: You can safely close your laptop or kill the CLI process. The server keeps polling the Gemini Interaction in the background.
-
Resumption: Hours (really, minutes) later, you "resume" the specific task to get the result.
./bin/client resume <TASK_ID> # Output: [Status Update] Research Complete. # [Artifact] deep_research_report.md


The Interactions API provided the "Thinking"; A2A provided the "Patience."

Pattern 2: The Reference Pattern (zero-copy handoff)
The most thought-provoking pattern emerges when we look at inter-agent collaboration.
In a stateless world, if Agent A (Researcher) wants to pass its work to Agent B (Writer), it must serialize the entire conversation history and send it over the wire. This is slow, expensive, and fragile.
With A2A + Interactions, we enable Zero-Copy Handoffs.

This is implemented in a2a-experiments using the --ref flag. This flag tells the new agent, "Don't just listen to me; look at that task for context."
Example: Cross-Skill Chaining
- Step 1 (Research): You generate a dense, machine-readable report using the Researcher agent.
./bin/client invoke "Research A2A protocol standards" --skill ai_researcher # Returns Task ID: task_123 - Step 2 (Summarize): You ask a different skill to summarize it, passing
task_123as a reference../bin/client invoke "Summarize this for a slide deck" --skill summarize --ref task_123

What happened here?
- The client did not download and re-upload the report.
- The
summarizeskill received theReferenceTaskID. - The server looked up
task_123, found theInteractionID(or the Report Artifact), and loaded it directly into the context of the new skill.
Agent B has effectively "downloaded" Agent A's brain state without transferring a single byte of context. They are sharing the same "Mind" (the Interactions Session) but applying different "Skills" (Prompts/Tools).
A Note on Architecture: Intra- vs. Inter-Agent
In this implementation, both the Researcher and Summarizer skills live within the same server process, sharing a local memory store. This makes the "handoff" instantaneous.
However, this Reference Pattern is designed to scale horizontally. In a production system, Agent A and Agent B could be entirely different microservices, er, agents. As long as they share a common "Task Registry" (like a Redis backend or a centralized A2A Control Plane), the pattern holds: the TaskID is the universal pointer that allows any authorized agent to "mount" the context of another's work.
Why This Matters
We are moving toward a world of Agentic Mesh—systems where specialized agents collaborate to solve complex problems.
- Interactions API gives us the "Brain" that can hold a thought for days.
- A2A gives us the "Protocol" to pass that thought around safely.
This combination allows us to build systems that are not just "chatbots that remember," but distributed applications that think.
Areas for Exploration
This architecture opens several doors we intend to explore further:
- The State Bridge Implementation: A technical deep dive into mapping A2A Tasks to Interaction IDs in Go.
- Impedance Matching Protocols: How to bridge gRPC and JSON-RPC for high-performance agents.
- Agentic UX: Designing user interfaces that can visualize and manage these long-running, "thinking" states.
Try It Yourself
Explore the patterns discussed here in the reference implementation:
- Repo:
https://github.com/ghchinoy/a2a-experiments - Patterns Doc:
docs/agentic_patterns.md - Scenarios:
docs/scenarios.md - Code Walkthrough:
docs/code_walkthrough.md
