Arxiv Digest
Arxiv digest: Agentic workflows, LLM fine-tuning
Five papers stand out: a framework for translating research questions into executable workflows via LLM-guided agents; gradient-informed vector adaptation for efficient fine-tuning; and methods to reduce hallucinations in vision-language models through visual grounding.
This week’s arxiv activity spans agentic systems for scientific automation, parameter-efficient fine-tuning strategies, and efforts to ground vision-language models more firmly in visual input rather than textual priors.
Top 5
| Rank | Title | Authors | Score | Why it matters |
|---|---|---|---|---|
| 1 | From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation | Balis, Orzechowski, et al. | 9/10 | Closes gap between natural language research intent and executable workflows; demonstrates practical agentic decomposition. |
| 2 | GiVA: Gradient-Informed Bases for Vector-Based Adaptation | Gangwar, Deshmukh, et al. | 8/10 | Achieves LoRA-competitive training time with extreme parameter efficiency; addresses practical fine-tuning bottleneck. |
| 3 | When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs | Khayatan, Parekh, et al. | 8/10 | Isolates textual priors as primary hallucination source; proposes DPO-based mitigation for vision-language grounding. |
| 4 | Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models | Tan, Wang, Guo | 7/10 | Demonstrates LLM-based agent strategy synthesis across game classes; interactive environment for agentic reasoning. |
| 5 | Low-Rank Adaptation Redux for Large Models | Li, Zhang, Giannakis | 7/10 | Signal-processing perspective on LoRA variants; bridges classical low-rank modeling with modern adapter design. |
Flagship: From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation
This paper by Balis, Orzechowski, Kica, Dygas, and Kuszewski addresses a structural bottleneck in computational science: the manual translation of research questions into executable scientific workflows. While workflow systems have matured in orchestration (scheduling, fault tolerance, resource management), the semantic layer remains manual, requiring domain experts to encode both scientific intent and infrastructure constraints into directed acyclic graphs (DAGs).
The authors propose a three-layer agentic architecture that decomposes this translation problem:
-
Semantic layer (LLM-driven): An LLM interprets natural language research questions into structured intents, capturing what the scientist wants to compute without prescribing how.
-
Deterministic layer (validated generators): Deterministic code generators consume intents and produce reproducible workflow DAGs. This layer ensures that identical intents always yield identical workflows, confining LLM non-determinism to intent extraction alone.
-
Knowledge layer (domain expertise): Domain experts author “Skills”, markdown documents that encode vocabulary mappings, parameter constraints, optimization strategies, and tool invocations. Skills act as a knowledge base that grounds the LLM’s reasoning in domain reality.
The decomposition is elegant: it leverages LLM reasoning where it excels (natural language understanding) while isolating its non-determinism from the deterministic execution path. Skills provide a human-auditable interface between the LLM and the workflow engine.
The team evaluated this architecture on the 1000 Genomes population genetics project, a domain with rich, complex workflows. The paper does not provide detailed quantitative results in the abstract, but the architecture’s design principle is sound: by requiring domain experts to write Skills rather than workflows directly, the system lowers the barrier for scientists while maintaining reproducibility.
Limitations are implicit but worth noting: the approach assumes that natural language research questions can be reliably mapped to intents, and that Skills are comprehensive enough to cover the domain’s vocabulary. The paper does not discuss failure modes (e.g., ambiguous or out-of-domain questions) or the overhead of authoring and maintaining Skills. Additionally, the evaluation is limited to a single domain; generalization to other scientific fields remains open.
The work is significant because it demonstrates a practical decomposition pattern for agentic systems: separate the reasoning layer (LLM) from the deterministic execution layer (generators) and ground both in domain-specific knowledge (Skills). This pattern is likely to be adopted in other automation contexts beyond scientific workflows.
Also noteworthy
-
GiVA: Gradient-Informed Bases for Vector-Based Adaptation: Proposes gradient-based initialization for vector-based fine-tuning, achieving LoRA-competitive training time while maintaining extreme parameter efficiency across NLU, NLG, and image classification tasks.
-
When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs: Introduces HalluScope benchmark and HalluVL-DPO framework, demonstrating that textual priors and instruction-induced knowledge dominate hallucinations in vision-language models, with DPO fine-tuning as a mitigation.
-
Low-Rank Adaptation Redux for Large Models: Signal-processing perspective on LoRA variants, bridging classical low-rank modeling theory with modern adapter designs to guide principled method selection across architectural and deployment constraints.
Takeaways
Agentic decomposition is maturing: The scientific workflow paper exemplifies a design pattern where LLMs handle semantic reasoning while deterministic layers handle execution. This separation of concerns is becoming standard practice in agentic systems, reducing hallucination risk and improving auditability.
Parameter efficiency remains a practical priority: Both GiVA and the LoRA survey highlight ongoing tension between fine-tuning quality and computational cost. Gradient-informed initialization and signal-processing perspectives suggest that efficiency gains are not exhausted; expect continued refinement of adapter methods.
Vision-language grounding is an active problem: The hallucination work shows that textual priors overwhelm visual input in current LVLMs, and that DPO-based fine-tuning can help. This indicates that production vision-language systems will likely require explicit visual grounding mechanisms, not just larger models.
Further reading
- Balis et al. arxiv paper on agentic scientific workflows: Full proposal of the three-layer agentic architecture for research-to-workflow translation.
- GiVA paper on gradient-informed vector adaptation: Detailed evaluation of gradient-based initialization across diverse benchmarks.
- HalluScope benchmark and HalluVL-DPO framework: Analysis of hallucination sources and DPO-based mitigation in vision-language models.
- LoRA signal-processing survey: Theoretical foundations and design principles for low-rank adaptation variants.
Frequently asked
What is the three-layer agentic architecture proposed for scientific workflows?
The architecture separates semantic intent extraction (LLM layer), deterministic workflow generation (generator layer), and domain knowledge encoding (Skills layer). This decomposition confines LLM non-determinism to intent extraction while ensuring reproducible workflow generation, with domain experts authoring Skills as markdown documents that ground reasoning in domain reality.
How does GiVA achieve LoRA-competitive performance with vector-based adaptation?
GiVA uses gradient-based initialization to seed vector-based fine-tuning parameters, reducing the rank required to match LoRA performance. This maintains extreme parameter efficiency while achieving comparable training times, as demonstrated across natural language understanding, generation, and image classification tasks.
What is the primary source of hallucinations in vision-language models according to HalluScope?
HalluScope identifies textual priors and background knowledge introduced through textual instructions as the dominant source of hallucinations, rather than vision backbone limitations or language model dominance alone. HalluVL-DPO mitigates this through fine-tuning that prioritizes visual grounding.
Why is decomposing LLM reasoning from deterministic execution important in agentic systems?
Separation ensures that non-deterministic LLM outputs (which may hallucinate or vary) do not propagate into the execution layer. By isolating reasoning to intent extraction and using deterministic generators for workflow production, the system maintains reproducibility and auditability while leveraging LLM strengths in natural language understanding.
What signal-processing insights does the LoRA survey provide for fine-tuning method selection?
The survey bridges classical low-rank modeling and inverse problems theory with modern adapter design, providing principled guidance on architectural choices, optimization techniques, and deployment constraints. This theoretical grounding helps practitioners select among LoRA variants rather than relying on empirical comparisons alone.