Fat fingered too... "Connect source should be able to achieve the same
centrality of offsets "

On Thu., 17 May 2018, 10:27 pm Stephane Maarek, <
steph...@simplemachines.com.au> wrote:

> Say you have 50 connectors all with different ACLs and service account.
> That's 50 connect clusters to maintain. So 50*3 internal connect topics to
> maintain (they can't share the same connect topics because they're
> different clusters). At default config we're talking 1500 partitions which
> is a lot for a Kafka cluster.
>
> The parallel is with consumer groups. Are all consumer groups backing
> their offset in their own topic or in a central topic ? Connect should be
> able to achieve the same centrality of Configs.
>
> Finally , Configs should go along with the jar, and not be stored in
> Kafka, especially for connectors that have secrets. There's no reason Kafka
> needs to have a database secret on its own disk
>
> On Thu., 17 May 2018, 5:55 pm Rahul Singh, <rahul.xavier.si...@gmail.com>
> wrote:
>
>> First sentence fat fingered.
>>
>> “Just curious as to why there’s an issue with the backing topics for
>> Kafka Connect.”
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On May 17, 2018, 6:17 AM -0400, Stephane Maarek <
>> steph...@simplemachines.com.au>, wrote:
>> > Hi Salius
>> >
>> > I think you're on the money, but you're not pushing things too far.
>> > This is something I've hoped for a long time.
>> > Let's talk Kafka Connect v2
>> >
>> > Kafka Connect Cluster, as you said, are not convenient to work with (the
>> > KIP details drawbacks well). I'm all about containerisation just like
>> > stream apps support (and boasts!).
>> >
>> > Now, here's the problem with Kafka Connect. There are three backing
>> topics.
>> > Here's the analysis of how they can evolve:
>> > - Config topic: this one is irrelevant if each connect cluster comes
>> with a
>> > config bundled with the corresponding JAR, as you mentioned in your KIP
>> > - Status topic: this is something I wish was gone too. The consumers
>> have a
>> > coordinator, and I believe the connect workers should have a coordinator
>> > too, for task rebalancing.
>> > - Source Offset topic: only relevant for sources. I wish there was a
>> > __connect_offsets global topic just like for consumers and an
>> > "ConnectOffsetCoordinator" to talk to to retrieve latest committed
>> offset.
>> >
>> > If we look above, with a few back-end fundamental transformations, we
>> can
>> > probably make Connect "cluster-less".
>> >
>> > What the community would get out of it is huge:
>> > - Connect workers for a specific connector are independent and isolated,
>> > measurable (in CPU and Mem) and auto-scalable
>> > - CI/CD is super easy to integrate, as it's just another container /
>> jar.
>> > - You can roll restart a specific connector and upgrade a JAR without
>> > interrupting your other connectors and while keeping the current
>> connector
>> > from running.
>> > - The topics backing connect are removed except the global one, which
>> > allows you to scale easily in terms of number of connectors
>> > - Running a connector in dev or prod (for people offering connectors)
>> is as
>> > easy as doing a simple "docker run".
>> > - Each consumer / producer settings can be configured at the container
>> > level.
>> > - Each connect process is immutable in configuration.
>> > - Each connect process has its own security identity (right now, you
>> need a
>> > connect cluster per service role, which is a lot of overhead in terms of
>> > backing topic)
>> >
>> > Now, I don't have the Kafka expertise to know exactly which changes to
>> make
>> > in the code, but I believe the final idea is achievable.
>> > The change would be breaking for how Kafka Connect is run, but I think
>> > there's a chance to make the change non breaking to how Connect is
>> > programmed. I believe the same public API framework can be used.
>> >
>> > Finally, the REST API can be used for monitoring, or the JMX metrics as
>> > usual.
>> >
>> > I may be completely wrong, but I would see such a change drive the
>> > utilisation, management of Connect by a lot while lowering the barrier
>> to
>> > adoption.
>> >
>> > This change may be big to implement but probably worthwhile. I'd be
>> happy
>> > to provide more "user feedback" on a PR, but probably won't be able to
>> > implement a PR myself.
>> >
>> > More than happy to discuss this
>> >
>> > Best,
>> > Stephane
>> >
>> >
>> > Kind regards,
>> > Stephane
>> >
>> > [image: Simple Machines]
>> >
>> > Stephane Maarek | Developer
>> >
>> > +61 416 575 980
>> > steph...@simplemachines.com.au
>> > simplemachines.com.au
>> > Level 2, 145 William Street, Sydney NSW 2010
>> >
>> > On 17 May 2018 at 14:42, Saulius Valatka <saulius...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > the only real usecase for the REST interface I can see is providing
>> > > health/liveness checks for mesos/kubernetes. It's also true that the
>> API
>> > > can be left as is and e.g. not exposed publicly on the platform
>> level, but
>> > > this would still leave opportunities to accidentally mess something up
>> > > internally, so it's mostly a safety concern.
>> > >
>> > > Regarding the option renaming: I agree that it's not necessary as
>> it's not
>> > > clashing with anything, my reasoning is that assuming some other
>> offset
>> > > storage appears in the future, having all config properties at the
>> root
>> > > level of offset.storage.* _MIGHT_ introduce clashes in the future, so
>> this
>> > > is just a suggestion for introducing a convention of
>> > > offset.storage.<store>.<properties>, which the existing
>> > > property offset.storage.file.filename already adheres to. But in
>> general,
>> > > yes -- this can be left as is.
>> > >
>> > >
>> > >
>> > > 2018-05-17 1:20 GMT+03:00 Jakub Scholz <ja...@scholz.cz>:
>> > >
>> > > > Hi,
>> > > >
>> > > > What do you plan to use the read-only REST interface for? Is there
>> > > > something what you cannot get through metrics interface? Otherwise
>> it
>> > > might
>> > > > be easier to just disable the REST interface (either in the code,
>> or just
>> > > > on the platform level - e.g. in Kubernetes).
>> > > >
>> > > > Also, I do not know what is the usual approach in Kafka ... but do
>> we
>> > > > really have to rename the offset.storage.* options? The current
>> names do
>> > > > not seem to have any collision with what you are adding and they
>> would
>> > > get
>> > > > "out of sync" with the other options used in connect
>> (status.storage.*
>> > > and
>> > > > config.storage.*). So it seems a bit unnecessary change to me.
>> > > >
>> > > > Jakub
>> > > >
>> > > >
>> > > >
>> > > > On Wed, May 16, 2018 at 10:10 PM Saulius Valatka <
>> saulius...@gmail.com
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I'd like to start a discussion on the following KIP:
>> > > > >
>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > 304%3A+Connect+runtime+mode+improvements+for+container+platforms
>> > > > >
>> > > > > Basically the idea is to make it easier to run separate instances
>> of
>> > > > Kafka
>> > > > > Connect hosting isolated connectors on container platforms such as
>> > > Mesos
>> > > > or
>> > > > > Kubernetes.
>> > > > >
>> > > > > In particular it would be interesting to hear opinions about the
>> > > proposed
>> > > > > read-only REST API mode, more specifically I'm concerned about the
>> > > > > possibility to implement it in distributed mode as it appears the
>> > > > framework
>> > > > > is using it internally (
>> > > > >
>> > > > > https://github.com/apache/kafka/blob/trunk/connect/
>> > > > runtime/src/main/java/org/apache/kafka/connect/runtime/
>> > > > distributed/DistributedHerder.java#L1019
>> > > > > ),
>> > > > > however this particular API method appears to be undocumented(?).
>> > > > >
>> > > > > Looking forward for your feedback.
>> > > > >
>> > > >
>> > >
>>
>

Reply via email to