---
title: "Arxiv digest: Agents, tokenization, and latent communication"
description: "This week's arxiv highlights self-evolving agents, efficient tokenization via convex optimization, and safe KV-cache sharing in multi-agent LLM systems."
tldr: "MOSS enables source-level code rewriting for self-evolving agents; ConvexTok improves tokenization via linear programming; LCGuard guards sensitive information in latent multi-agent communication through KV caches."
url: "https://aigentic.blog/arxiv-digest-agents-tokenization-latent-communication"
publishedAt: "2026-05-23T13:00:39.315Z"
updatedAt: "2026-05-23T13:00:39.315Z"
category: "arxiv-digest"
tags: ["arxiv","research","multi-agent-systems","agent-architecture","llm-optimization"]
---

# Arxiv digest: Agents, tokenization, and latent communication

> MOSS enables source-level code rewriting for self-evolving agents; ConvexTok improves tokenization via linear programming; LCGuard guards sensitive information in latent multi-agent communication through KV caches.

This week's arxiv output spans self-improving agent systems, foundational NLP improvements, and the emerging infrastructure needed to scale stateful multi-agent coordination safely.

## Top 5

| Rank | Title | Authors | Score | Why it matters |
|------|-------|---------|-------|----------------|
| 1 | [MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems](https://arxiv.org/abs/2605.22794v1) | Cai, Zhang, Jia, et al. | 9/10 | Unlocks structural fixes in agent code post-deployment without human intervention |
| 2 | [LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems](https://arxiv.org/abs/2605.22785v1) | Asif, Amiri, Abbas, et al. | 8/10 | Prevents sensitive leakage through transformer KV caches in agent coordination |
| 3 | [DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback](https://arxiv.org/abs/2605.22781v1) | Dong, He, Hou, et al. | 8/10 | Enables high-frequency state exploration for agents via delta-based snapshots |
| 4 | [Tokenisation via Convex Relaxations](https://arxiv.org/abs/2605.22821v1) | Tempus, Whittington, Schmidt, et al. | 7/10 | Replaces greedy tokenization with globally optimal linear programming solution |
| 5 | [Vector Policy Optimization: Training for Diversity Improves Test-Time Search](https://arxiv.org/abs/2605.22817v1) | Bahlous-Boldi, Puri, Shenfeld, et al. | 7/10 | Trains LLM policies to produce diverse outputs for downstream search procedures |

## Flagship: MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

[MOSS](https://arxiv.org/abs/2605.22794v1), authored by Cai, Zhang, Jia, and colleagues, addresses a hard limitation in deployed autonomous agent systems: they remain static after launch. Current self-evolving agent frameworks restrict adaptation to text-mutable artifacts (prompts, skill files, memory schemas, workflow graphs), leaving the agent harness code itself untouched. This means routing logic, hook ordering, state invariants, and dispatch routines cannot be modified without human intervention, even when they are the root cause of recurring failures.

The core insight is that source-level code adaptation is a fundamentally more general medium than text adaptation. It is Turing-complete, subsumes every text-mutable scope, takes effect deterministically rather than through LLM compliance with prompts, and does not degrade under long-context drift. MOSS implements this principle by enabling self-modifying agents to rewrite their own source code.

The system operates in a feedback loop: the agent observes its own failures, proposes code edits to fix them, validates the edits in a sandbox, and deploys only those that improve performance. Critically, the authors frame this not as prompting an LLM to generate code, but as a systematic optimization process where the agent's code becomes a learnable artifact. Because edits are source-level and deterministic, they persist and compound across deployments, unlike text-based adaptations that depend on the LLM's willingness to follow instructions.

The paper evaluates MOSS on a suite of agentic tasks involving tool use, planning, and error recovery. In one experiment, agents using MOSS fix routing bugs in themselves that greedy text-based approaches miss; in another, they optimize dispatch order to reduce unnecessary tool calls. The authors also demonstrate that source-level changes are more robust to distribution shift than prompt-based fixes, since the underlying logic is explicit rather than implicit in the model's learned associations.

Limitations include the need for a safe sandbox environment (not always available in production), the risk of agent-authored code introducing subtle bugs that are harder to audit than natural language changes, and the computational cost of compile-test-deploy cycles. The paper acknowledges these trade-offs but argues that for long-lived production agents, the benefits outweigh the costs. The work is most immediately applicable to agents operating in controlled environments (robotics, codegen workflows, structured reasoning tasks) rather than open-ended user-facing systems.

MOSS represents a meaningful shift in the agent lifecycle: from static-after-deployment to continuously self-improving, bounded only by sandbox constraints and code quality assurance. For infrastructure builders, it surfaces the need for better agent sandboxing, version control, and automated code review for LLM-generated patches.

## Also noteworthy

- [LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems](https://arxiv.org/abs/2605.22785v1) (Asif, Amiri, Abbas, et al.) tackles a blind spot: multi-agent systems that share transformer KV caches for efficiency also leak contextual and reasoning information through an opaque channel, bypassing explicit textual disclosure; LCGuard learns representation-level transformations to sanitize shared caches before transmission.

- [DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback](https://arxiv.org/abs/2605.22781v1) (Dong, He, Hou, et al.) solves the latency bottleneck in agent state exploration by observing that consecutive checkpoints are highly similar and storing only deltas; this cuts checkpoint/rollback time from hundreds of milliseconds to single-digit milliseconds, enabling deep tree search and large-scale fan-outs.

- [Tokenisation via Convex Relaxations](https://arxiv.org/abs/2605.22821v1) (Tempus, Whittington, Schmidt, et al.) reformulates tokenizer design as a linear program solved via convex optimization, replacing greedy algorithms (BPE, Unigram) with a globally optimal solution that improves bits-per-byte and certifies distance to optimality (within 1% at common vocabulary sizes).

## Takeaways

Self-improvement in deployed agents is shifting from external (human-driven patching) to internal (source-level code rewriting). MOSS demonstrates that agent harnesses are as learnable and mutable as any other part of the system; the practical barrier is not conceptual but infrastructural (sandboxing, auditing, version control). Expect frameworks to expose agent code as a first-class optimization target in 2026.

Multi-agent coordination through latent channels (KV caches, embeddings) trades transparency for efficiency. LCGuard's defense mechanisms are necessary but incomplete; the broader lesson is that any shared representation in a multi-agent system is a potential information leak. Practitioners building sensitive multi-agent workflows should treat latent communication with the same rigor as cryptographic channels.

Efficiency gains are flowing to foundational components: tokenization (ConvexTok), checkpoint/rollback (DeltaBox), and policy training (VPO). None are flashy, but all lower the computational cost of either training or deploying agentic systems. The compound effect of these improvements will expand the feasible scale and complexity of production agent workflows.

## Further reading

- [MOSS arxiv abstract](https://arxiv.org/abs/2605.22794v1): Self-evolution through source-level rewriting in autonomous agent systems.
- [LCGuard arxiv abstract](https://arxiv.org/abs/2605.22785v1): Latent communication guard for safe KV sharing in multi-agent LLM systems.
- [DeltaBox arxiv abstract](https://arxiv.org/abs/2605.22781v1): Scaling stateful AI agents with millisecond-level sandbox checkpoint/rollback.
- [ConvexTok arxiv abstract](https://arxiv.org/abs/2605.22821v1): Tokenization via convex relaxations and linear programming.
- [VPO arxiv abstract](https://arxiv.org/abs/2605.22817v1): Vector policy optimization for diversity-aware test-time search.

```

## Frequently asked

### What is MOSS and how does it differ from text-based agent adaptation?

MOSS performs source-level code rewriting on the agent harness itself, not just text artifacts like prompts or skill files. This enables structural fixes (routing, state invariants, dispatch) that are unreachable from the text layer and are Turing-complete, making it a strict superset of prior self-evolution approaches.

### How does ConvexTok improve over greedy tokenizers like BPE?

ConvexTok formulates tokenizer construction as a linear program solved via convex optimization, replacing greedy local decisions with globally optimal vocabulary design. It yields measurable improvements in bits-per-byte and downstream task performance while certifying how far the solution is from optimal (within 1% at common sizes).

### Why is latent KV-cache communication risky in multi-agent systems?

KV caches encode contextual inputs, intermediate reasoning states, and agent-specific information. Sharing them across agents creates an opaque channel where sensitive content propagates without explicit disclosure, bypassing the transparency of natural-language communication.

### What does DeltaBox solve for LLM-based agents?

DeltaBox enables millisecond-level checkpoint and rollback of sandbox state for agents doing test-time tree search. It observes that consecutive checkpoints are highly similar and stores only deltas instead of full duplicates, reducing latency from hundreds of milliseconds to seconds down to single-digit milliseconds.
