Thanks Martijn for driving this! I’m +1 for Martijn’s proposal. It’s important to avoid making some connectors above others, and all connectors should share the same quality standard. Keeping some basic connectors like FileSystem is reasonable since it’s crucial for new users to try and explore Flink quickly.
Another point I’d like to mention is that we need to add more E2E cases using basic connectors in Flink main repo after we moving connectors out. Currently E2E tests are heavily dependent on connectors. It’s essential to keep the coverage and quality of Flink main repo even without these connector’s E2E cases. Best regards, Qingsheng Ren > On Jan 5, 2022, at 9:59 PM, Martijn Visser <mart...@ververica.com> wrote: > > Hi everyone, > > As already mentioned in the previous discussion thread [1] I'm opening up a > parallel discussion thread on moving connectors from Flink to external > connector repositories. If you haven't read up on this discussion before, I > recommend reading that one first. > > The goal with the external connector repositories is to make it easier to > develop and release connectors by not being bound to the release cycle of > Flink itself. It should result in faster connector releases, a more active > connector community and a reduced build time for Flink. > > We currently have the following connectors available in Flink itself: > > * Kafka -> For DataStream & Table/SQL users > * Upsert-Kafka -> For Table/SQL users > * Cassandra -> For DataStream users > * Elasticsearch -> For DataStream & Table/SQL users > * Kinesis -> For DataStream users & Table/SQL users > * RabbitMQ -> For DataStream users > * Google Cloud PubSub -> For DataStream users > * Hybrid Source -> For DataStream users > * NiFi -> For DataStream users > * Pulsar -> For DataStream users > * Twitter -> For DataStream users > * JDBC -> For DataStream & Table/SQL users > * FileSystem -> For DataStream & Table/SQL users > * HBase -> For DataStream & Table/SQL users > * DataGen -> For Table/SQL users > * Print -> For Table/SQL users > * BlackHole -> For Table/SQL users > * Hive -> For Table/SQL users > > I would propose to move out all connectors except Hybrid Source, > FileSystem, DataGen, Print and BlackHole because: > > * We should avoid at all costs that certain connectors are considered as > 'Core' connectors. If that happens, it creates a perception that there are > first-grade/high-quality connectors because they are in 'Core' Flink and > second-grade/lesser-quality connectors because they are outside of the > Flink codebase. It directly hurts the goal, because these connectors are > still bound to the release cycle of Flink. Last but not least, it risks any > success of external connector repositories since every connector > contributor would still want to be in 'Core' Flink. > * To continue on the quality of connectors, we should aim that all > connectors are of high quality. That means that we shouldn't have a > connector that's only available for either DataStream or Table/SQL users, > but for both. It also means that (if applicable) the connector should > support all options, like bounded and unbounded scan, lookup, batch and > streaming sink capabilities. In the end the quality should depend on the > maintainers of the connector, not on where the code is maintained. > * The Hybrid Source connector is a special connector because of its > purpose. > * The FileSystem, DataGen, Print and BlackHole connectors are important for > first time Flink users/testers. If you want to experiment with Flink, you > will most likely start with a local file before moving to one of the other > sources or sinks. These 4 connectors can help with either reading/writing > local files or generating/displaying/ignoring data. > * Some of the connectors haven't been maintained in a long time (for > example, NiFi and Google Cloud PubSub). An argument could be made that we > check if we actually want to move such a connector or make the decision to > drop the connector entirely. > > I'm looking forward to your thoughts! > > Best regards, > > Martijn Visser | Product Manager > > mart...@ververica.com > > [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm > > <https://www.ververica.com/> > > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time