Re: The household of the Kafka connector

Aljoscha Krettek Mon, 22 Jun 2015 11:59:42 -0700

Marton referred to the KafkaSink in 4). For sources the job will keep
running by reading from a different broker.


On Mon, 22 Jun 2015 at 18:45 Stephan Ewen <se...@apache.org> wrote:

> I would like to consolidate those as well.
>
> Biggest blocker is, however, that the PersistentKafkaSource never commits
> to zookeeper when checkpointing is not enabled. It should at least group
> commit periodically in those cases.
>
> Concerning (4), I though the high-level consumer (that we build
> the PersistentKafkaSource on) handles broker failures. Apparently it does
> not?
>
> On Mon, Jun 22, 2015 at 2:08 PM, Márton Balassi <balassi.mar...@gmail.com>
> wrote:
>
> > Hey,
> >
> > Due to the effort invested to the Kafka connector mainly by Robert and
> > Gabor Hermann we are going to ship a fairly nice solution for reading
> from
> > and writing to Kafka with 0.9.0. This is the most prominent streaming
> > connector currently, and rightfully so as pipeline level end-to-end
> exactly
> > once processing for streaming is dependent on it in this release.
> >
> > To make it even more user-friendly and efficient I find the following
> tasks
> > important:
> >
> > 1. Currently we ship two Kafka sources and for historic reason the
> > non-persistent one is called KafkaSource and the persistent one is
> > PersistentKafkaSource, the latter is also more difficult to find.
> Although
> > the former one is easier to read and a bit faster, let us enforce that
> > people use the fault-tolerant version by default. The behavior is already
> > documented both in javadoc and on the Flink website. Reported by Ufuk.
> >
> > Proposed solution: Deprecate the non-persistent KafkaSource, may be move
> > the PersistentKafkaSource to org.apache.flink.streaming.kafka. (Currently
> > it is in its subpackage, persistent.) Eventually we could even rename the
> > current PersistentKafkaSource to KafkaSource.
> >
> > 2. The documentation of the streaming connectors is a bit hidden on the
> > website. These are not included in the connectors section [1], but are at
> > the very end of the streaming guide. [2]
> >
> > Proposed solution: Move them to connectors and link it from the streaming
> > guide.
> >
> > 3. Collocate the KafkaSource and the KafkaSink with the corresponding
> > Brokers if possible for improved performance. There is a ticket for this.
> > [3]
> >
> > 4. Handling Broker failures on the KafkaSink side.
> >
> > Currently instead of looking for a new Broker the sink throws an
> exception
> > and thus cancels the job if the Broker failes. Assuming that the job has
> > execution retries left and an available Broker to write to the job comes
> > back, finds the Broker and continues. Added a ticket for it just now.
> > [4] Reported
> > by Aljoscha.
> >
> > [1]
> >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/example_connectors.html
> > [2]
> >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#stream-connectors
> > [3] https://issues.apache.org/jira/browse/FLINK-1673
> > [4] https://issues.apache.org/jira/browse/FLINK-2256
> >
> > Best,
> >
> > Marton
> >
>

Re: The household of the Kafka connector

Reply via email to