AgentFlux

AgentFlux: A Framework for Privacy-Preserving
On-Device Agentic Systems

We introduce a framework for privacy-preserving, on-device AI agent workloads. By decoupling agentic tasks into function selection and argument generation, both tackled by local LLM orchestration, our system delivers accuracy approaching cloud-based models while fully protecting user data from third-party exposure and enabling cost-efficient execution on consumer hardware.

1

Tool Selector Adapter

Classification: Which tool to invoke?

2

Argument Generator Adapter

Generation: Tool-specific arguments

3

On-Device Execution

Privacy-preserving orchestration

AgentFlux: Decoupled Post-Training Pipeline & Inference Framework

System Demonstration

Watch AgentFlux in Action

See how AgentFlux orchestrates privacy-preserving tool calling across multiple applications, performing complex tasks while keeping all sensitive data on your device.

Demo: AgentFlux performing file operations tasks entirely on-device, compared to Llama-3.1-8B model

Why It Matters

AgentFlux bridges the performance gap between frontier orchestration models and local deployable systems.

๐Ÿ”’

Privacy-Preserving AI

Agents that run fully offline, protecting sensitive user data from third-party exposure

โšก

Efficient Orchestration

No reasoning latency overhead, fast execution on consumer hardware

๐Ÿ”ง

Scalable & Modular

Training for tool ecosystems that evolve over time

By decoupling fine-tuning and introducing dynamic adapter loading, AgentFlux democratizes agentic AI, bringing practical autonomy to the edge.

Motivating Use Cases

๐Ÿ’ฐ

Blockchains & Financial Applications

AgentFlux enables local consolidation, analysis, and reporting across blockchain and traditional finance, keeping all private data on-device. Only anonymized outputs leave the user's machine.

๐ŸŒ

AI Browsers

Local models execute privacy-preserving tasks for sensitive data while collaborating with cloud models for large public data like web search summarization.

๐Ÿ’ป

Developer Terminals & Coding Agents

AgentFlux addresses data leakage in coding assistants by executing parts locally and cost-efficiently while maintaining access to entire codebases.

Background

AI systems are rapidly expanding from chatbots and media generation to robotics and financial applications. Leading AI platforms run in the cloud, sending all user queriesโ€”often including sensitive context like code, preferences, and past interactionsโ€”to third-party providers.

๐Ÿ”’ Privacy Challenge

User data, including medical and financial records, is routinely exposed to cloud providers.

โšก Cost & Latency

Cloud APIs charge per token and throttle requests, with true steady-state costs still unknown.

The Solution: Edge Computing with AgentFlux

AgentFlux introduces a new framework for edge computing that partitions workloads into two distinct tasks: selection of functions that need to be called, and argument generation. This partitioning allows for a hierarchical architecture that achieves end-to-end accuracy comparable with state-of-the-art cloud models while maintaining the benefits of privacy and performance on consumer-grade GPUs.

Core Architecture

AgentFlux Architecture Design

AgentFlux architecture showing the decoupled post-training pipeline and inference framework with specialized LoRA adapters

Agentic systems autonomously solve complex tasks through iterative cycles: decomposing goals into discrete steps, executing each by invoking external tools, and dynamically adjusting based on tool outputs. Success hinges on LLM orchestrationโ€”the system's ability to accurately select the right tool and generate correct arguments at each decision point.

AgentFlux fundamentally reimagines this orchestration. Rather than relying on a monolithic LLM orchestrator, it employs multiple specialized LoRA adapters trained through a decoupled post-training pipeline and coordinated by a novel inference framework.

Post-Training Pipeline

1. Tool Selector Adapter

Functions as a classifier, identifying the optimal tool for each workflow step during inference.

2. Argument Generator Adapter

Produces precise, context-appropriate arguments for the selected tool at each step.

Decoupled Inference Framework

Classification Sub-Step

Dynamically loads the tool selector adapter to determine which tool to invoke.

Argument Generation Sub-Step

Dynamically loads the corresponding argument generator adapter to construct the tool's input parameters.

Inference Pipeline Flow

Toolset Selection

Base model routes to relevant toolset (Filesystem, Notion, Monday.com)

Tool Selection (Classification)

Load Tool Selector LoRA adapter โ†’ Classify which specific tool to invoke

Argument Generation

Load Argument Generator LoRA adapter โ†’ Generate precise, structured arguments

Tool Execution

Execute in containerized sandbox โ†’ Return observation โ†’ Continue or summarize

Complete inference pipeline showing hierarchical orchestration with dynamic LoRA adapter loading

Citation

BibTeX

@article{kadekodi2025dualtune, title={DualTune: Decoupled Fine-Tuning for On-Device Agentic Systems}, author={Kadekodi, Rohan and Jin, Zhan and Kamahori, Keisuke and Gu, Yile and Khatiri, Sean and Bayindirli, Noah H and Gorbunov, Sergey and Kasikci, Baris}, journal={arXiv preprint arXiv:2510.00229}, year={2025} }