There is a mistake I see over and over in LLM projects.
Teams assume that once they pick a model with a large context window, memory is basically solved. They think the hard part is buying enough room. It usually is not. The hard part is deciding what deserves to stay in that room.
That is why context window compaction matters.
In a real agent, conversation history does not grow in a clean way. It fills up with retries, tool traces, repeated clarifications, partial reasoning, and old facts that are no longer useful. The model may still accept all of it, but that does not mean the agent will perform well. More context is often just more noise.
Microsoft Agent Framework is valuable here because it gives C# teams a structured way to manage that problem. You are not limited to endlessly replaying the full transcript. You can run conversations through an AgentSession, apply compaction strategies through a CompactionProvider, and decide how older turns should be reduced before they are sent back to the model. That is the difference between a chat log and an engineered memory system.
The real problem is not length. It is signal decay.
Once a conversation gets long enough, the issue stops being token count alone.
The real issue is that the ratio of useful information to irrelevant information gets worse with every turn. An unresolved production incident, a business constraint, and an approved decision may be buried under twenty messages of tool chatter and polite follow-up text. That is not a model problem. That is an application design problem.
Compaction solves that by reducing the weight of older exchanges while preserving their meaning.
In Microsoft Agent Framework, this is handled as part of the context pipeline. Instead of treating history as sacred, you can actively reshape it. You can summarize older sections, keep only the most recent turns, compress tool-heavy exchanges, or combine multiple strategies into a pipeline. That changes the agent from a passive receiver of history into a system that curates its own working context.
Why this matters more in C# enterprise systems
This gets even more important in .NET environments because many C# agent implementations sit inside business workflows, not simple chat apps.
They support service operations, internal copilots, regulated processes, incident handling, document review, or enterprise task orchestration. In those settings, the conversation is rarely just conversational. It is operational. It is carrying facts, outcomes, approvals, and unresolved issues.
If that history is handled poorly, the failure mode is not just “the answer feels weaker.” The failure mode is that the agent forgets a real business constraint, repeats already completed work, or responds confidently from a distorted version of the past.
That is why I think compaction should be treated as part of the system architecture, not as a prompt optimization trick.
The better design: preserve intent, compress residue
The right goal is not to preserve every message.
The right goal is to preserve intent, commitments, relevant facts, and recent local coherence.
Everything else is negotiable.
That means a strong agent design usually keeps four things in play:
- stable instructions that do not drift
- session state that survives across turns
- recent messages in raw form
- older context reduced into a smaller, more useful representation
Microsoft Agent Framework gives you the right hooks for this. AgentSession holds the live conversation state. CompactionProvider lets you transform that state before model invocation. Built-in strategies let you choose whether older context should be summarized, windowed, truncated, or compressed around tool usage.
This is a much better model than pretending all context has equal value.
A practical C# example
Below is an example that shows how to build a compacting agent in Microsoft Agent Framework using Azure OpenAI.
The structure is intentionally simple:
- one primary model for the agent
- one cheaper model for summary work
- one session reused across turns
- one compaction pipeline that becomes more aggressive as history grows
1) Create the model clients
using System;using Azure.AI.OpenAI;using Azure.Identity;using Microsoft.Agents.AI;using Microsoft.Extensions.AI;var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT") ?? throw new InvalidOperationException("Missing AZURE_OPENAI_ENDPOINT.");var mainDeployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT_NAME") ?? "gpt-4o";var summaryDeployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_SUMMARIZER_DEPLOYMENT_NAME") ?? "gpt-4o-mini";var azureClient = new AzureOpenAIClient( new Uri(endpoint), new AzureCliCredential());IChatClient mainChatClient = azureClient.GetChatClient(mainDeployment).AsIChatClient();IChatClient summaryChatClient = azureClient.GetChatClient(summaryDeployment).AsIChatClient();
The first practical decision here is to separate the main reasoning model from the summarization model. That keeps compaction cheap. Summarization is important, but it usually does not need your best and most expensive deployment.
2) Define a staged compaction strategy
using Microsoft.Agents.AI.Context;using Microsoft.Agents.AI.Context.Compaction;var compactionStrategy = new PipelineCompactionStrategy( new ToolResultCompactionStrategy( CompactionTriggers.TokensExceed(400)), new SummarizationCompactionStrategy( summaryChatClient, CompactionTriggers.TokensExceed(1200), minimumPreserved: 4), new SlidingWindowCompactionStrategy( CompactionTriggers.TurnsExceed(10)), new TruncationCompactionStrategy( CompactionTriggers.TokensExceed(4000), minimumPreserved: 8));
This pattern reflects a more realistic production mindset.
First, compress tool-heavy noise. Then summarize older history when it starts to become expensive. Then keep a recent working window. Finally, use truncation as the last line of defense. That progression is usually much healthier than going straight to blunt truncation.
3) Build the agent with compaction in the pipeline
var agent = mainChatClient .AsBuilder() .UseAIContextProviders(new CompactionProvider(compactionStrategy)) .BuildAIAgent( new ChatClientAgentOptions { Name = "OperationsAgent", ChatOptions = new() { Instructions = """ You are a precise enterprise operations agent. Preserve constraints and decisions. Do not over-index on stale details. Use recent conversation for local continuity. If earlier context has likely been compacted, be explicit about uncertainty. Prefer concise, operationally useful responses. """ } });
There is an important design choice here. The compaction behavior is part of the builder pipeline, not bolted awkwardly onto the side. That means context is being shaped as part of the runtime flow, not treated as an afterthought.
4) Run the agent through a persistent session
AgentSession session = await agent.CreateSessionAsync();var response1 = await agent.RunAsync( "We have a recurring payroll export failure after the tax validation tool runs.", session);Console.WriteLine(response1);var response2 = await agent.RunAsync( "Important constraint: this only affects German employees and we cannot expose internal employee IDs.", session);Console.WriteLine(response2);var response3 = await agent.RunAsync( "Given everything so far, identify the likely cause and propose the next operational step.", session);Console.WriteLine(response3);
This is where compaction becomes meaningful. A session gives the agent continuity across turns, and the compaction provider stops that continuity from turning into bloat.
What to pay attention to
The first thing to watch is what kind of content is accumulating.
If your agent uses tools heavily, raw tool results often become the biggest source of waste. In that case, tool-result compaction usually pays off early. If the conversation is advisory or investigative, summarization matters more because the value is in preserving conclusions, not raw exchange volume.
The second thing to watch is when compaction begins.
Many implementations compact too late. They wait until the agent is already near the token ceiling. By then the agent has already spent too many turns dragging excess context around. The better pattern is to start compacting when the conversation is merely getting noisy, not only when it is about to fail.
The third thing to watch is what should never be compacted into ambiguity.
Instructions should stay explicit. Hard constraints should remain legible. Important decisions should not dissolve into soft narrative language. If the agent must remember that “internal IDs must never appear in user-facing output,” that should survive compaction with precision.
The fourth thing to watch is the boundary between compaction and retrieval.
These are not the same thing. Compaction manages conversation memory. Retrieval brings in external knowledge. If your agent needs product policy, contract terms, or historical incident records, that belongs in retrieval. If your agent needs to stay coherent across a long-running thread, that belongs in compaction. Mature systems use both.
Where this becomes strategically important
This design becomes especially valuable when agents stop being novelty interfaces and start becoming part of business execution.
That includes:
- operational support agents
- investigation assistants
- workflow copilots
- compliance review tools
- service-delivery copilots
- internal engineering and incident agents
In all of these cases, the cost of forgotten context is real. The agent is not just chatting. It is participating in work. Once that happens, context cannot be treated as leftover prompt text. It becomes part of the control surface of the system.
That is why I think context window compaction deserves more attention than it gets. It sounds small. It is not small. It is one of the places where agent engineering stops being a prompt-writing exercise and becomes actual software design.
Conclusion
Most teams still frame long context as a model selection question. I think it is a memory design question.
The winning systems will not be the ones that merely buy the biggest window. They will be the ones that decide, with discipline, what the model should keep, what it should compress, and what it should go fetch fresh.
Microsoft Agent Framework gives C# teams the right primitives to do that well. Sessions keep continuity. Compaction shapes history. Workflows can impose control when the process matters more than improvisation.
That is the real lesson. The context window is not your memory strategy. It is just the space you have to execute one.









