Re: kafka connect architecture

Ewen Cheslack-Postava Mon, 30 Jan 2017 22:57:04 -0800

On Mon, Jan 30, 2017 at 8:24 AM, Koert Kuipers <ko...@tresata.com> wrote:


> i have been playing with kafka connect in standalone and distributed mode.
>
> i like standalone because:
> * i get to configure it using a file. this is easy for automated deployment
> (chef, puppet, etc.). configuration using a rest api i find inconvenient.
>

What exactly is inconvenient? The orchestration tools you mention all have
built-in tooling to make REST requests. In fact, you could pretty easily
take a config file you could use with standalone mode and convert it into
the JSON payload for the REST API and simply make that request. If the
connector already exists with the same config, it shouldn't have any effect
on the cluster -- it's just a noop re-registration.


> * erors show up in log files instead of having to retrieve them using a
> rest api. same argument as previous bullet point really. i know how to
> automate log monitoring. rest api isnt great for this.
>

If you run in distributed mode, you probably also want to collect log files
somehow. The errors still show up in log files, they are just spread across
multiple nodes so you may need to collect them to put them in a central
location. (Hint: connect can do this :))


> * isolation of connector classes. every connector has its own jvm. no jar
> dependency hell.
>

Yup, this is definitely a pain point. We're looking into classpath
isolation in a subsequent release (won't be in AK 0.10.2.0/CP 3.2.0, but I
am hoping it will be in AK 0.10.3.0/CP3.3.0).


>
> i like distributed because:
> * well its fault tolerant and can distribute workload
>
> so this makes me wonder... how hard would it be to get the
> "connect-standalone" setup where each connector has its own service(s),
> configuration is done using files, and errors are written to logs, yet at
> the same time i can spin up multiple services for a connector and they form
> a group? and while we are at it also remove the rest api entirely, since i
> dont need it, it poses a security risk, and it makes it hard to spin up
> multiple connectors on same box. with such a setup i could simply deploy as
> many services as i need for a connector, using either chef, or perhaps
> slider on yarn, or whatever framework i need.
>

A distributed mode driven by config files is possible and something that's
been brought up before, although does have some complicating factors. Doing
a rolling bounce of such a service gets tricky in the face of failures as
you might have old & new versions of the app starting simultaneously (i.e.
it becomes difficult to figure out which config to trust).

As to removing the REST API in some cases, I guess I could imagine doing
it, but in practice you should probably just lock down access by never
allowing access to that port. If you're worried about security, you should
have all ports disabled by default; if you don't want to provide access to
the REST API, simply don't enable access to it.

-Ewen


>
> this is related to KAFKA-3815
> <https://issues.apache.org/jira/browse/KAFKA-3815> which makes similar
> arguments for container deployments
>

Re: kafka connect architecture

Reply via email to