Hey Everyone,

As a follow-up to my Keynote talk, Building and deploying LLM applications
with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>, I
am formally proposing the addition of these 5 providers to the Apache
Airflow repo:

   -

   PgVector <https://github.com/pgvector/pgvector>
   -

   Weaviate <https://weaviate.io/>
   -

   Pinecone <https://www.pinecone.io/>
   -

   OpenAI <https://openai.com/>
   -

   Cohere <https://cohere.com/>


Advancements in LLMs are moving at a rapid pace & transforming the way we
work and our industry. Although LLMs are simple to use in prototyping,
using LLM for enterprise applications and for production still presents a
lot of challenges. These
<https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8>
are some of the same problems that we tackle in Data Engineering, and
Airflow is a natural fit for them.

We at Astronomer would like to add first-class support for the popular LLMs
(OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that
Data Scientists and ML engineers can utilize them natively with easy-to-use
Operator & Hook abstractions while providing a native (and
Production-ready) approach for Authentication, retries, logging etc.

We also think this is vital for the Apache Airflow project as we, the
project, embrace the LLM tide and continue to be a great example of
balancing innovation and maintaining backward-compatibility.

The first versions of these providers will enable building one of the most
common use cases of LLMs i.e. Question and Answering / Chatbots using
Retrieval-augmented generation (RAG) done with the help of embeddings.

Everyone is welcome and encouraged to contribute once the PRs are merged.
Astronomer is committed to maintaining these providers in the Airflow repo,
including reviewing PRs, maintaining code quality, testing and keeping the
APIs up-to-date.

Note: PgVector <https://github.com/pgvector/pgvector> is an open-source
project, so we don’t need a formal vote for it as per our guidelines
<https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers>.
So please consider this email as seeking a Lazy Consensus for it.

I will open up a VOTING thread after discussing this for a few days.

Thanks.

Regards,

Kaxil

Reply via email to