see inline. best, koert On Tue, Jan 31, 2017 at 1:56 AM, Ewen Cheslack-Postava <e...@confluent.io> wrote:
> On Mon, Jan 30, 2017 at 8:24 AM, Koert Kuipers <ko...@tresata.com> wrote: > > > i have been playing with kafka connect in standalone and distributed > mode. > > > > i like standalone because: > > * i get to configure it using a file. this is easy for automated > deployment > > (chef, puppet, etc.). configuration using a rest api i find inconvenient. > > > > What exactly is inconvenient? The orchestration tools you mention all have > built-in tooling to make REST requests. In fact, you could pretty easily > take a config file you could use with standalone mode and convert it into > the JSON payload for the REST API and simply make that request. If the > connector already exists with the same config, it shouldn't have any effect > on the cluster -- it's just a noop re-registration. > it is true that for example chef has some build in support for REST, but its not nearly as well developed as their config file (template) framework. i expect the same for other tools (but dont know). also with files security has been solved: a tool like chef runs as the user with admin privileges to modify these files , and permissions can trivially be set so that a limited set of users can read these files. all this is much harder with a REST API. > > > * erors show up in log files instead of having to retrieve them using a > > rest api. same argument as previous bullet point really. i know how to > > automate log monitoring. rest api isnt great for this. > > > > If you run in distributed mode, you probably also want to collect log files > somehow. The errors still show up in log files, they are just spread across > multiple nodes so you may need to collect them to put them in a central > location. (Hint: connect can do this :)) > > my experience so far with errors in connectors was that they did not show up in the log of the distributed connect service. only by going to the rest api endpoint for the status of the connector (GET /connectors/<name>/status) could i get the error. perhaps i have to adjust my logging settings. > > > * isolation of connector classes. every connector has its own jvm. no jar > > dependency hell. > > > > Yup, this is definitely a pain point. We're looking into classpath > isolation in a subsequent release (won't be in AK 0.10.2.0/CP 3.2.0, but I > am hoping it will be in AK 0.10.3.0/CP3.3.0). > > > > > > i like distributed because: > > * well its fault tolerant and can distribute workload > > > > so this makes me wonder... how hard would it be to get the > > "connect-standalone" setup where each connector has its own service(s), > > configuration is done using files, and errors are written to logs, yet at > > the same time i can spin up multiple services for a connector and they > form > > a group? and while we are at it also remove the rest api entirely, since > i > > dont need it, it poses a security risk, and it makes it hard to spin up > > multiple connectors on same box. with such a setup i could simply deploy > as > > many services as i need for a connector, using either chef, or perhaps > > slider on yarn, or whatever framework i need. > > > > A distributed mode driven by config files is possible and something that's > been brought up before, although does have some complicating factors. Doing > a rolling bounce of such a service gets tricky in the face of failures as > you might have old & new versions of the app starting simultaneously (i.e. > it becomes difficult to figure out which config to trust). > i didnt think about this too much. indeed my plan was to simply bounce all the services for a particular connector at the same time, and accept downtime for the given connector. i could do a rolling restart if i am ok with a mix of old and new running at same time, which might be acceptable for minor fixes. how does kafka streams handle this? > > As to removing the REST API in some cases, I guess I could imagine doing > it, but in practice you should probably just lock down access by never > allowing access to that port. If you're worried about security, you should > have all ports disabled by default; if you don't want to provide access to > the REST API, simply don't enable access to it. > > -Ewen > > > > > > this is related to KAFKA-3815 > > <https://issues.apache.org/jira/browse/KAFKA-3815> which makes similar > > arguments for container deployments > > >