Re: kafka connect questions

Sönke Liebau Wed, 05 Jul 2017 01:23:08 -0700

Hi Clay,

I agree with Gwen in thinking that you might want to take a second look at
streaming protobuf data to Kafka and then having connectors read that from
Kafka. To address the issues in order:

1. You say you have hundreds of machines sending data, but if you run a
connector that is not tied to a single ip address you'd basically need to
update all these data senders with the new ip address, if the connector
moves around in the cluster. Only way around that that I can come up with
is to have a loadbalancer with a single ip pointed at all machines in the
cluster and performing regular healthchecks to find out where the job is
currently running (similar to what Mesos does), but that would not be
interruption-free.

2. Kafka does run in a cluster mode, is HA and very scalable. It was
arguably built to do the exact job that you are describing here. The
downside is, that you would need to change your data senders, which might
not be possible, I do not know that. Perhaps you could implement a tiny
tool that reads from TCP and forwards the message to Kafka (Logstash might
be an option, not sure). To make this HA and scalable just deploy more than
one of these jobs and put a loadbalancer before them to distribute requests
across all instances. This is a very similar architecture to what you
wanted to do with connect, but without the issue of jobs moving around in
the cluster which would create unnecessary complexity.

Just my 2 cent, but hope it helps :)

On Wed, Jul 5, 2017 at 6:09 AM, Clay Teahouse <clayteaho...@gmail.com>
wrote:

> Hello Gwen,
>
> Thanks for the reply. My comments/answers inline.
>
> 1. Connectors that listen on sockets typically run in stand-alone mode, so
> they can tied to a specific machine (in distributed mode, connectors can
> move around).
> [Clay:] Even if the connectors move around, they can still listen to a
> specific port on the node in the cluster, right? The data will be sent to
> the cluster of connectors from hundreds of data sources.
> 2. Why do you need a connector? Why not just use Kafka producer to send
> protobuf directly to Kafka?
>
> [Clay:] I have hundreds of data sources which push the data to the
> connectors. I do need the connectors to run in a cluster mode, for HA and
> scalability.
>
>
>
> On Tue, Jul 4, 2017 at 10:45 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > I don't remember seeing one. There is no reason not to write one (let us
> > know if you do, so we can put it on the connector hub!).
> >
> > Few things:
> > 1. Connectors that listen on sockets typically run in stand-alone mode,
> so
> > they can tied to a specific machine (in distributed mode, connectors can
> > move around).
> > 2. Why do you need a connector? Why not just use Kafka producer to send
> > protobuf directly to Kafka?
> >
> > Gwen
> > On Tue, Jul 4, 2017 at 9:02 AM Clay Teahouse <clayteaho...@gmail.com>
> > wrote:
> >
> > > Hello All,
> > >
> > > I'd appreciate your help with the following questions.
> > >
> > > 1) Is there kafka connect for listening to tcp sockets?
> > >
> > > 2) If, can the messages be in protobuf, with each messaged prefixed
> with
> > > the length of the message?
> > >
> > > thanks
> > > Clay
> > >
> >
>

-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Re: kafka connect questions

Reply via email to