Hi everyone, Pavan and I have been working on AIP-99 native agentic AI for Airflow 3. The first set of PRs have landed.
The core idea: Airflow already has 350+ provider hooks, each pre-authenticated through connections. AIP-99 turns those hooks directly into AI agent tools. What's available now: 1. HookToolset: wraps any Airflow hook into AI-callable tools with explicit allowed_methods: from airflow.providers.common.ai.toolsets import HookToolset HookToolset(hook=S3Hook(aws_conn_id="my_aws"), allowed_methods=["list_keys"]) 2. SQLToolset: 4 curated database tools (list tables, describe schema, execute query, fetch results) scoped to specific tables. 3. DataFusionToolset — lets AI agents query files on object stores (S3, local filesystem, Iceberg) through Apache DataFusion. Agents get SQL access to Parquet, CSV, and Avro files without loading them into a database. 4. MCPToolset: connects to external MCP servers via Airflow connections. 5. Task decorators (Operators are also available :) ): - @task.llm : single LLM call with structured output - @task.agent : multi-step agent with tool access - @task.llm_sql : text-to-SQL pipelines - @task.llm_schema_compare : cross-database schema diffing LLM connections are configured through Airflow's standard connection model, supporting OpenAI, Anthropic, Google, Ollama, etc. HITL (Human-in-the-Loop) integration is also in progress as a draft PR. Project Board: - https://github.com/orgs/apache/projects/586 Summit talk where we previewed this: https://www.youtube.com/watch?v=XSAzSDVUi2o Separate from the AI work, AIP-99 also adds an AnalyticsOperator powered by Apache DataFusion for high-performance SQL on object stores: - AnalyticsOperator — run SQL queries directly against S3, GCS, local files, and Iceberg tables. Supports Parquet, CSV, Avro. - @task.analytics decorator — TaskFlow API support for the above. - Iceberg support via PyIceberg with Glue catalog integration. Pavan and I would love it if folks can start testing out and create GitHub issues if you run into bugs. Our intention is to keep it at 0.x version so we can iterate on it faster. Looking forward to feedback. Thanks, Kaxil
