Re: [kafka-connect] multiple or single clusters?

Ewen Cheslack-Postava Sat, 23 Jul 2016 17:40:49 -0700

On Fri, Jun 24, 2016 at 11:16 AM, noah <iamn...@gmail.com> wrote:

> I'm having some trouble figuring out the right way to run Kafka Connect in
> production. We will have multiple sink connectors that we need to remain
> running indefinitely and have at least once semantics (with as little
> duplication as possible) so it seems clear that we need to run in
> distributed mode so that our offsets are durable and we can scale up by
> adding new distributed mode instances of Connect.
>
> What isn't clear is what the best way to run multiple, heterogenous
> connectors in distributed mode is. It looks like every instance of Connect
> will read the config/status topics and take on some number of tasks (and
> that tasks can't be assigned to specific running instances of Connect.) It
> also looks like it is only possible to configure 1 key and value converter
> per Connect instance. So if I need two different conversion strategies, I'd
> need to either write a custom converter that can figure it out, or run
> multiple Connect clusters, each with their own set of config+offset+status
> topics.
>
> Is that right? Worst case, I need another set of N distributed Connect
> instances per sink/source, which ends up being a lot of topics to manage.
> What does a real-world Connect topology look like?
>



Yeah, ideally you want to minimize the # of clusters just to minimize
operationalization costs -- it is easier to maintain the one cluster than
N. However, you're right that at the moment we only support one converter
type per cluster. We want to make that configurable per connector (with a
cluster-wide default to keep configuration cheap when you know you want to
standardize on one type for most connectors), but we haven't gotten to
implementing that yet. Look for it in an upcoming release! But that does
mean that, for now, if you want to use different converters you'll need to
run multiple clusters.

-Ewen

Re: [kafka-connect] multiple or single clusters?

Reply via email to