AIgentic

Agentic Systems & LLM Tooling Daily

Arxiv Digest

Arxiv digest: Agentic workflows, LLM fine-tuning

Five papers stand out: a framework for translating research questions into executable workflows via LLM-guided agents; gradient-informed vector adaptation for efficient fine-tuning; and methods to reduce hallucinations in vision-language models through visual grounding.


This week’s arxiv activity spans agentic systems for scientific automation, parameter-efficient fine-tuning strategies, and efforts to ground vision-language models more firmly in visual input rather than textual priors.

Top 5

RankTitleAuthorsScoreWhy it matters
1From Research Question to Scientific Workflow: Leveraging Agentic AI for Science AutomationBalis, Orzechowski, et al.9/10Closes gap between natural language research intent and executable workflows; demonstrates practical agentic decomposition.
2GiVA: Gradient-Informed Bases for Vector-Based AdaptationGangwar, Deshmukh, et al.8/10Achieves LoRA-competitive training time with extreme parameter efficiency; addresses practical fine-tuning bottleneck.
3When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMsKhayatan, Parekh, et al.8/10Isolates textual priors as primary hallucination source; proposes DPO-based mitigation for vision-language grounding.
4Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language ModelsTan, Wang, Guo7/10Demonstrates LLM-based agent strategy synthesis across game classes; interactive environment for agentic reasoning.
5Low-Rank Adaptation Redux for Large ModelsLi, Zhang, Giannakis7/10Signal-processing perspective on LoRA variants; bridges classical low-rank modeling with modern adapter design.

Flagship: From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

This paper by Balis, Orzechowski, Kica, Dygas, and Kuszewski addresses a structural bottleneck in computational science: the manual translation of research questions into executable scientific workflows. While workflow systems have matured in orchestration (scheduling, fault tolerance, resource management), the semantic layer remains manual, requiring domain experts to encode both scientific intent and infrastructure constraints into directed acyclic graphs (DAGs).

The authors propose a three-layer agentic architecture that decomposes this translation problem:

  1. Semantic layer (LLM-driven): An LLM interprets natural language research questions into structured intents, capturing what the scientist wants to compute without prescribing how.

  2. Deterministic layer (validated generators): Deterministic code generators consume intents and produce reproducible workflow DAGs. This layer ensures that identical intents always yield identical workflows, confining LLM non-determinism to intent extraction alone.

  3. Knowledge layer (domain expertise): Domain experts author “Skills”, markdown documents that encode vocabulary mappings, parameter constraints, optimization strategies, and tool invocations. Skills act as a knowledge base that grounds the LLM’s reasoning in domain reality.

The decomposition is elegant: it leverages LLM reasoning where it excels (natural language understanding) while isolating its non-determinism from the deterministic execution path. Skills provide a human-auditable interface between the LLM and the workflow engine.

The team evaluated this architecture on the 1000 Genomes population genetics project, a domain with rich, complex workflows. The paper does not provide detailed quantitative results in the abstract, but the architecture’s design principle is sound: by requiring domain experts to write Skills rather than workflows directly, the system lowers the barrier for scientists while maintaining reproducibility.

Limitations are implicit but worth noting: the approach assumes that natural language research questions can be reliably mapped to intents, and that Skills are comprehensive enough to cover the domain’s vocabulary. The paper does not discuss failure modes (e.g., ambiguous or out-of-domain questions) or the overhead of authoring and maintaining Skills. Additionally, the evaluation is limited to a single domain; generalization to other scientific fields remains open.

The work is significant because it demonstrates a practical decomposition pattern for agentic systems: separate the reasoning layer (LLM) from the deterministic execution layer (generators) and ground both in domain-specific knowledge (Skills). This pattern is likely to be adopted in other automation contexts beyond scientific workflows.

Also noteworthy

Takeaways

Agentic decomposition is maturing: The scientific workflow paper exemplifies a design pattern where LLMs handle semantic reasoning while deterministic layers handle execution. This separation of concerns is becoming standard practice in agentic systems, reducing hallucination risk and improving auditability.

Parameter efficiency remains a practical priority: Both GiVA and the LoRA survey highlight ongoing tension between fine-tuning quality and computational cost. Gradient-informed initialization and signal-processing perspectives suggest that efficiency gains are not exhausted; expect continued refinement of adapter methods.

Vision-language grounding is an active problem: The hallucination work shows that textual priors overwhelm visual input in current LVLMs, and that DPO-based fine-tuning can help. This indicates that production vision-language systems will likely require explicit visual grounding mechanisms, not just larger models.

Further reading

Frequently asked

What is the three-layer agentic architecture proposed for scientific workflows?

The architecture separates semantic intent extraction (LLM layer), deterministic workflow generation (generator layer), and domain knowledge encoding (Skills layer). This decomposition confines LLM non-determinism to intent extraction while ensuring reproducible workflow generation, with domain experts authoring Skills as markdown documents that ground reasoning in domain reality.

How does GiVA achieve LoRA-competitive performance with vector-based adaptation?

GiVA uses gradient-based initialization to seed vector-based fine-tuning parameters, reducing the rank required to match LoRA performance. This maintains extreme parameter efficiency while achieving comparable training times, as demonstrated across natural language understanding, generation, and image classification tasks.

What is the primary source of hallucinations in vision-language models according to HalluScope?

HalluScope identifies textual priors and background knowledge introduced through textual instructions as the dominant source of hallucinations, rather than vision backbone limitations or language model dominance alone. HalluVL-DPO mitigates this through fine-tuning that prioritizes visual grounding.

Why is decomposing LLM reasoning from deterministic execution important in agentic systems?

Separation ensures that non-deterministic LLM outputs (which may hallucinate or vary) do not propagate into the execution layer. By isolating reasoning to intent extraction and using deterministic generators for workflow production, the system maintains reproducibility and auditability while leveraging LLM strengths in natural language understanding.

What signal-processing insights does the LoRA survey provide for fine-tuning method selection?

The survey bridges classical low-rank modeling and inverse problems theory with modern adapter design, providing principled guidance on architectural choices, optimization techniques, and deployment constraints. This theoretical grounding helps practitioners select among LoRA variants rather than relying on empirical comparisons alone.

← All posts