New: Native AI/LLM integration + DataFusion analytics for Apache Airflow 3 (AIP-99)

Kaxil Naik Thu, 05 Mar 2026 05:04:47 -0800

Hi everyone,

Pavan and I have been working on AIP-99 native agentic AI for Airflow 3.
The first set of PRs have landed.


The core idea: Airflow already has 350+ provider hooks, each
pre-authenticated through connections. AIP-99 turns those hooks directly
into AI agent tools.

What's available now:

1. HookToolset: wraps any Airflow hook into AI-callable tools with
   explicit allowed_methods:

   from airflow.providers.common.ai.toolsets import HookToolset

   HookToolset(hook=S3Hook(aws_conn_id="my_aws"),
allowed_methods=["list_keys"])

2. SQLToolset: 4 curated database tools (list tables, describe schema,
   execute query, fetch results) scoped to specific tables.

3. DataFusionToolset — lets AI agents query files on object stores (S3,
   local filesystem, Iceberg) through Apache DataFusion. Agents get SQL
   access to Parquet, CSV, and Avro files without loading them into a
   database.

4. MCPToolset: connects to external MCP servers via Airflow connections.

5. Task decorators (Operators are also available :) ):
   - @task.llm : single LLM call with structured output
   - @task.agent : multi-step agent with tool access
   - @task.llm_sql : text-to-SQL pipelines
   - @task.llm_schema_compare : cross-database schema diffing

LLM connections are configured through
Airflow's standard connection model, supporting OpenAI, Anthropic, Google,
Ollama, etc.

HITL (Human-in-the-Loop) integration is also in progress as a draft PR.

Project Board:
- https://github.com/orgs/apache/projects/586

Summit talk where we previewed this:
https://www.youtube.com/watch?v=XSAzSDVUi2o

Separate from the AI work, AIP-99 also adds an AnalyticsOperator powered
by Apache DataFusion for high-performance SQL on object stores:

- AnalyticsOperator — run SQL queries directly against S3, GCS, local
  files, and Iceberg tables. Supports Parquet, CSV, Avro.
- @task.analytics decorator — TaskFlow API support for the above.
- Iceberg support via PyIceberg with Glue catalog integration.

Pavan and I would love it if folks can start testing out and create GitHub
issues if you run into bugs. Our intention is to keep it at 0.x version so
we can iterate on it faster. Looking forward to feedback.

Thanks,
Kaxil

New: Native AI/LLM integration + DataFusion analytics for Apache Airflow 3 (AIP-99)

Reply via email to