May bad, (4) is clear now. On Mon, Jun 22, 2015 at 8:59 PM, Aljoscha Krettek <aljos...@apache.org> wrote:
> Marton referred to the KafkaSink in 4). For sources the job will keep > running by reading from a different broker. > > On Mon, 22 Jun 2015 at 18:45 Stephan Ewen <se...@apache.org> wrote: > > > I would like to consolidate those as well. > > > > Biggest blocker is, however, that the PersistentKafkaSource never commits > > to zookeeper when checkpointing is not enabled. It should at least group > > commit periodically in those cases. > > > > Concerning (4), I though the high-level consumer (that we build > > the PersistentKafkaSource on) handles broker failures. Apparently it does > > not? > > > > On Mon, Jun 22, 2015 at 2:08 PM, Márton Balassi < > balassi.mar...@gmail.com> > > wrote: > > > > > Hey, > > > > > > Due to the effort invested to the Kafka connector mainly by Robert and > > > Gabor Hermann we are going to ship a fairly nice solution for reading > > from > > > and writing to Kafka with 0.9.0. This is the most prominent streaming > > > connector currently, and rightfully so as pipeline level end-to-end > > exactly > > > once processing for streaming is dependent on it in this release. > > > > > > To make it even more user-friendly and efficient I find the following > > tasks > > > important: > > > > > > 1. Currently we ship two Kafka sources and for historic reason the > > > non-persistent one is called KafkaSource and the persistent one is > > > PersistentKafkaSource, the latter is also more difficult to find. > > Although > > > the former one is easier to read and a bit faster, let us enforce that > > > people use the fault-tolerant version by default. The behavior is > already > > > documented both in javadoc and on the Flink website. Reported by Ufuk. > > > > > > Proposed solution: Deprecate the non-persistent KafkaSource, may be > move > > > the PersistentKafkaSource to org.apache.flink.streaming.kafka. > (Currently > > > it is in its subpackage, persistent.) Eventually we could even rename > the > > > current PersistentKafkaSource to KafkaSource. > > > > > > 2. The documentation of the streaming connectors is a bit hidden on the > > > website. These are not included in the connectors section [1], but are > at > > > the very end of the streaming guide. [2] > > > > > > Proposed solution: Move them to connectors and link it from the > streaming > > > guide. > > > > > > 3. Collocate the KafkaSource and the KafkaSink with the corresponding > > > Brokers if possible for improved performance. There is a ticket for > this. > > > [3] > > > > > > 4. Handling Broker failures on the KafkaSink side. > > > > > > Currently instead of looking for a new Broker the sink throws an > > exception > > > and thus cancels the job if the Broker failes. Assuming that the job > has > > > execution retries left and an available Broker to write to the job > comes > > > back, finds the Broker and continues. Added a ticket for it just now. > > > [4] Reported > > > by Aljoscha. > > > > > > [1] > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/example_connectors.html > > > [2] > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#stream-connectors > > > [3] https://issues.apache.org/jira/browse/FLINK-1673 > > > [4] https://issues.apache.org/jira/browse/FLINK-2256 > > > > > > Best, > > > > > > Marton > > > > > >