Fat fingered too... "Connect source should be able to achieve the same centrality of offsets "
On Thu., 17 May 2018, 10:27 pm Stephane Maarek, < steph...@simplemachines.com.au> wrote: > Say you have 50 connectors all with different ACLs and service account. > That's 50 connect clusters to maintain. So 50*3 internal connect topics to > maintain (they can't share the same connect topics because they're > different clusters). At default config we're talking 1500 partitions which > is a lot for a Kafka cluster. > > The parallel is with consumer groups. Are all consumer groups backing > their offset in their own topic or in a central topic ? Connect should be > able to achieve the same centrality of Configs. > > Finally , Configs should go along with the jar, and not be stored in > Kafka, especially for connectors that have secrets. There's no reason Kafka > needs to have a database secret on its own disk > > On Thu., 17 May 2018, 5:55 pm Rahul Singh, <rahul.xavier.si...@gmail.com> > wrote: > >> First sentence fat fingered. >> >> “Just curious as to why there’s an issue with the backing topics for >> Kafka Connect.” >> >> -- >> Rahul Singh >> rahul.si...@anant.us >> >> Anant Corporation >> >> On May 17, 2018, 6:17 AM -0400, Stephane Maarek < >> steph...@simplemachines.com.au>, wrote: >> > Hi Salius >> > >> > I think you're on the money, but you're not pushing things too far. >> > This is something I've hoped for a long time. >> > Let's talk Kafka Connect v2 >> > >> > Kafka Connect Cluster, as you said, are not convenient to work with (the >> > KIP details drawbacks well). I'm all about containerisation just like >> > stream apps support (and boasts!). >> > >> > Now, here's the problem with Kafka Connect. There are three backing >> topics. >> > Here's the analysis of how they can evolve: >> > - Config topic: this one is irrelevant if each connect cluster comes >> with a >> > config bundled with the corresponding JAR, as you mentioned in your KIP >> > - Status topic: this is something I wish was gone too. The consumers >> have a >> > coordinator, and I believe the connect workers should have a coordinator >> > too, for task rebalancing. >> > - Source Offset topic: only relevant for sources. I wish there was a >> > __connect_offsets global topic just like for consumers and an >> > "ConnectOffsetCoordinator" to talk to to retrieve latest committed >> offset. >> > >> > If we look above, with a few back-end fundamental transformations, we >> can >> > probably make Connect "cluster-less". >> > >> > What the community would get out of it is huge: >> > - Connect workers for a specific connector are independent and isolated, >> > measurable (in CPU and Mem) and auto-scalable >> > - CI/CD is super easy to integrate, as it's just another container / >> jar. >> > - You can roll restart a specific connector and upgrade a JAR without >> > interrupting your other connectors and while keeping the current >> connector >> > from running. >> > - The topics backing connect are removed except the global one, which >> > allows you to scale easily in terms of number of connectors >> > - Running a connector in dev or prod (for people offering connectors) >> is as >> > easy as doing a simple "docker run". >> > - Each consumer / producer settings can be configured at the container >> > level. >> > - Each connect process is immutable in configuration. >> > - Each connect process has its own security identity (right now, you >> need a >> > connect cluster per service role, which is a lot of overhead in terms of >> > backing topic) >> > >> > Now, I don't have the Kafka expertise to know exactly which changes to >> make >> > in the code, but I believe the final idea is achievable. >> > The change would be breaking for how Kafka Connect is run, but I think >> > there's a chance to make the change non breaking to how Connect is >> > programmed. I believe the same public API framework can be used. >> > >> > Finally, the REST API can be used for monitoring, or the JMX metrics as >> > usual. >> > >> > I may be completely wrong, but I would see such a change drive the >> > utilisation, management of Connect by a lot while lowering the barrier >> to >> > adoption. >> > >> > This change may be big to implement but probably worthwhile. I'd be >> happy >> > to provide more "user feedback" on a PR, but probably won't be able to >> > implement a PR myself. >> > >> > More than happy to discuss this >> > >> > Best, >> > Stephane >> > >> > >> > Kind regards, >> > Stephane >> > >> > [image: Simple Machines] >> > >> > Stephane Maarek | Developer >> > >> > +61 416 575 980 >> > steph...@simplemachines.com.au >> > simplemachines.com.au >> > Level 2, 145 William Street, Sydney NSW 2010 >> > >> > On 17 May 2018 at 14:42, Saulius Valatka <saulius...@gmail.com> wrote: >> > >> > > Hi, >> > > >> > > the only real usecase for the REST interface I can see is providing >> > > health/liveness checks for mesos/kubernetes. It's also true that the >> API >> > > can be left as is and e.g. not exposed publicly on the platform >> level, but >> > > this would still leave opportunities to accidentally mess something up >> > > internally, so it's mostly a safety concern. >> > > >> > > Regarding the option renaming: I agree that it's not necessary as >> it's not >> > > clashing with anything, my reasoning is that assuming some other >> offset >> > > storage appears in the future, having all config properties at the >> root >> > > level of offset.storage.* _MIGHT_ introduce clashes in the future, so >> this >> > > is just a suggestion for introducing a convention of >> > > offset.storage.<store>.<properties>, which the existing >> > > property offset.storage.file.filename already adheres to. But in >> general, >> > > yes -- this can be left as is. >> > > >> > > >> > > >> > > 2018-05-17 1:20 GMT+03:00 Jakub Scholz <ja...@scholz.cz>: >> > > >> > > > Hi, >> > > > >> > > > What do you plan to use the read-only REST interface for? Is there >> > > > something what you cannot get through metrics interface? Otherwise >> it >> > > might >> > > > be easier to just disable the REST interface (either in the code, >> or just >> > > > on the platform level - e.g. in Kubernetes). >> > > > >> > > > Also, I do not know what is the usual approach in Kafka ... but do >> we >> > > > really have to rename the offset.storage.* options? The current >> names do >> > > > not seem to have any collision with what you are adding and they >> would >> > > get >> > > > "out of sync" with the other options used in connect >> (status.storage.* >> > > and >> > > > config.storage.*). So it seems a bit unnecessary change to me. >> > > > >> > > > Jakub >> > > > >> > > > >> > > > >> > > > On Wed, May 16, 2018 at 10:10 PM Saulius Valatka < >> saulius...@gmail.com >> > > > wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > I'd like to start a discussion on the following KIP: >> > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > 304%3A+Connect+runtime+mode+improvements+for+container+platforms >> > > > > >> > > > > Basically the idea is to make it easier to run separate instances >> of >> > > > Kafka >> > > > > Connect hosting isolated connectors on container platforms such as >> > > Mesos >> > > > or >> > > > > Kubernetes. >> > > > > >> > > > > In particular it would be interesting to hear opinions about the >> > > proposed >> > > > > read-only REST API mode, more specifically I'm concerned about the >> > > > > possibility to implement it in distributed mode as it appears the >> > > > framework >> > > > > is using it internally ( >> > > > > >> > > > > https://github.com/apache/kafka/blob/trunk/connect/ >> > > > runtime/src/main/java/org/apache/kafka/connect/runtime/ >> > > > distributed/DistributedHerder.java#L1019 >> > > > > ), >> > > > > however this particular API method appears to be undocumented(?). >> > > > > >> > > > > Looking forward for your feedback. >> > > > > >> > > > >> > > >> >