Hey Everyone, As a follow-up to my Keynote talk, Building and deploying LLM applications with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>, I am formally proposing the addition of these 5 providers to the Apache Airflow repo:
- PgVector <https://github.com/pgvector/pgvector> - Weaviate <https://weaviate.io/> - Pinecone <https://www.pinecone.io/> - OpenAI <https://openai.com/> - Cohere <https://cohere.com/> Advancements in LLMs are moving at a rapid pace & transforming the way we work and our industry. Although LLMs are simple to use in prototyping, using LLM for enterprise applications and for production still presents a lot of challenges. These <https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8> are some of the same problems that we tackle in Data Engineering, and Airflow is a natural fit for them. We at Astronomer would like to add first-class support for the popular LLMs (OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that Data Scientists and ML engineers can utilize them natively with easy-to-use Operator & Hook abstractions while providing a native (and Production-ready) approach for Authentication, retries, logging etc. We also think this is vital for the Apache Airflow project as we, the project, embrace the LLM tide and continue to be a great example of balancing innovation and maintaining backward-compatibility. The first versions of these providers will enable building one of the most common use cases of LLMs i.e. Question and Answering / Chatbots using Retrieval-augmented generation (RAG) done with the help of embeddings. Everyone is welcome and encouraged to contribute once the PRs are merged. Astronomer is committed to maintaining these providers in the Airflow repo, including reviewing PRs, maintaining code quality, testing and keeping the APIs up-to-date. Note: PgVector <https://github.com/pgvector/pgvector> is an open-source project, so we don’t need a formal vote for it as per our guidelines <https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers>. So please consider this email as seeking a Lazy Consensus for it. I will open up a VOTING thread after discussing this for a few days. Thanks. Regards, Kaxil