Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

Rahul Singh Thu, 17 May 2018 05:10:16 -0700

Just curious as to why there’s a pr been for the backing topics for Kafka 
Connect.


I’m a fan of decoupling and love the world that operates in containers.

I’m just wondering specifically what the issue is for having topics storing the 
connect states for source / sinks.

Wouldn’t it make sense to keep it “central” to have one state of truth even if 
a coordinator crashes or whatever. Otherwise if you move everything including 
config to separate workers — you essentially have to manage all your config for 
each worker.

I like how Kubernetes works with configmaps. You register configmaps in kubectl 
and then can use those values in different places. Otherwise we’d be back to 
environment variables as in Docker compose. Having central state in etcd allows 
kube to create and deploy stateless deployments, pods, services.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On May 17, 2018, 6:17 AM -0400, Stephane Maarek 
<steph...@simplemachines.com.au>, wrote:
> Hi Salius
>
> I think you're on the money, but you're not pushing things too far.
> This is something I've hoped for a long time.
> Let's talk Kafka Connect v2
>
> Kafka Connect Cluster, as you said, are not convenient to work with (the
> KIP details drawbacks well). I'm all about containerisation just like
> stream apps support (and boasts!).
>
> Now, here's the problem with Kafka Connect. There are three backing topics.
> Here's the analysis of how they can evolve:
> - Config topic: this one is irrelevant if each connect cluster comes with a
> config bundled with the corresponding JAR, as you mentioned in your KIP
> - Status topic: this is something I wish was gone too. The consumers have a
> coordinator, and I believe the connect workers should have a coordinator
> too, for task rebalancing.
> - Source Offset topic: only relevant for sources. I wish there was a
> __connect_offsets global topic just like for consumers and an
> "ConnectOffsetCoordinator" to talk to to retrieve latest committed offset.
>
> If we look above, with a few back-end fundamental transformations, we can
> probably make Connect "cluster-less".
>
> What the community would get out of it is huge:
> - Connect workers for a specific connector are independent and isolated,
> measurable (in CPU and Mem) and auto-scalable
> - CI/CD is super easy to integrate, as it's just another container / jar.
> - You can roll restart a specific connector and upgrade a JAR without
> interrupting your other connectors and while keeping the current connector
> from running.
> - The topics backing connect are removed except the global one, which
> allows you to scale easily in terms of number of connectors
> - Running a connector in dev or prod (for people offering connectors) is as
> easy as doing a simple "docker run".
> - Each consumer / producer settings can be configured at the container
> level.
> - Each connect process is immutable in configuration.
> - Each connect process has its own security identity (right now, you need a
> connect cluster per service role, which is a lot of overhead in terms of
> backing topic)
>
> Now, I don't have the Kafka expertise to know exactly which changes to make
> in the code, but I believe the final idea is achievable.
> The change would be breaking for how Kafka Connect is run, but I think
> there's a chance to make the change non breaking to how Connect is
> programmed. I believe the same public API framework can be used.
>
> Finally, the REST API can be used for monitoring, or the JMX metrics as
> usual.
>
> I may be completely wrong, but I would see such a change drive the
> utilisation, management of Connect by a lot while lowering the barrier to
> adoption.
>
> This change may be big to implement but probably worthwhile. I'd be happy
> to provide more "user feedback" on a PR, but probably won't be able to
> implement a PR myself.
>
> More than happy to discuss this
>
> Best,
> Stephane
>
>
> Kind regards,
> Stephane
>
> [image: Simple Machines]
>
> Stephane Maarek | Developer
>
> +61 416 575 980
> steph...@simplemachines.com.au
> simplemachines.com.au
> Level 2, 145 William Street, Sydney NSW 2010
>
> On 17 May 2018 at 14:42, Saulius Valatka <saulius...@gmail.com> wrote:
>
> > Hi,
> >
> > the only real usecase for the REST interface I can see is providing
> > health/liveness checks for mesos/kubernetes. It's also true that the API
> > can be left as is and e.g. not exposed publicly on the platform level, but
> > this would still leave opportunities to accidentally mess something up
> > internally, so it's mostly a safety concern.
> >
> > Regarding the option renaming: I agree that it's not necessary as it's not
> > clashing with anything, my reasoning is that assuming some other offset
> > storage appears in the future, having all config properties at the root
> > level of offset.storage.* _MIGHT_ introduce clashes in the future, so this
> > is just a suggestion for introducing a convention of
> > offset.storage.<store>.<properties>, which the existing
> > property offset.storage.file.filename already adheres to. But in general,
> > yes -- this can be left as is.
> >
> >
> >
> > 2018-05-17 1:20 GMT+03:00 Jakub Scholz <ja...@scholz.cz>:
> >
> > > Hi,
> > >
> > > What do you plan to use the read-only REST interface for? Is there
> > > something what you cannot get through metrics interface? Otherwise it
> > might
> > > be easier to just disable the REST interface (either in the code, or just
> > > on the platform level - e.g. in Kubernetes).
> > >
> > > Also, I do not know what is the usual approach in Kafka ... but do we
> > > really have to rename the offset.storage.* options? The current names do
> > > not seem to have any collision with what you are adding and they would
> > get
> > > "out of sync" with the other options used in connect (status.storage.*
> > and
> > > config.storage.*). So it seems a bit unnecessary change to me.
> > >
> > > Jakub
> > >
> > >
> > >
> > > On Wed, May 16, 2018 at 10:10 PM Saulius Valatka <saulius...@gmail.com
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'd like to start a discussion on the following KIP:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 304%3A+Connect+runtime+mode+improvements+for+container+platforms
> > > >
> > > > Basically the idea is to make it easier to run separate instances of
> > > Kafka
> > > > Connect hosting isolated connectors on container platforms such as
> > Mesos
> > > or
> > > > Kubernetes.
> > > >
> > > > In particular it would be interesting to hear opinions about the
> > proposed
> > > > read-only REST API mode, more specifically I'm concerned about the
> > > > possibility to implement it in distributed mode as it appears the
> > > framework
> > > > is using it internally (
> > > >
> > > > https://github.com/apache/kafka/blob/trunk/connect/
> > > runtime/src/main/java/org/apache/kafka/connect/runtime/
> > > distributed/DistributedHerder.java#L1019
> > > > ),
> > > > however this particular API method appears to be undocumented(?).
> > > >
> > > > Looking forward for your feedback.
> > > >
> > >
> >

Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

Reply via email to