Just found a nice (but old) blog post that explains Flink's integration with Kafka: https://data-artisans.com/blog/kafka-flink-a-practical-how-to
I guess, the basics are still valid Nico On 16/01/18 11:17, Nico Kruber wrote: > Hi Jason, > I'd suggest to start with [1] and [2] for getting the basics of a Flink > program. > The DataStream API basically wires operators together with streams so > that whatever stream gets out of one operator is the input of the next. > By connecting both functions to the same Kafka stream source, your > program results in this: > > Kafka --> Function1 > | > --> Function2 > > where both functions receive all elements the previous stream offers > (elements are broadcasted). If you want the two functions to work on > different elements, you could add a filter before each function: > > DataStream<SomeObject> inputStream = ... got a kafka stream > inputStream.filter(...).process( Function1 ); > inputStream.filter(...).process( Function2 ); > > or split the stream (see [3] for available operators). > > I'm no expert on Kafka though, so I can't give you an advise on the > performance - I'd suggest to create some small benchmarks for your setup > since this probably depends on the cluster architecture and the > parallelism of the operators and the number of Kafka partitions. > Maybe Gordon (cc'd) can give some more insights. > > > Regards > Nico > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/programming-model.html > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/api_concepts.html > [3] > https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html > > On 12/01/18 21:57, Jason Kania wrote: >> Hi, >> >> I have a question that I have not resolved via the documentation, >> looking in the "Parallel Execution", "Streaming" and the "Connectors" >> sections. If I retrieve a kafka stream and then call the process >> function against it in parallel, as follows, does it consume in some >> round robin fashion between the two process calls or is each element >> coming out of the kafka connector consumed by both processors in parallel? >> >> DataStream<SomeObject> inputStream = ... got a kafka stream >> inputStream.process( Function1 ); >> inputStream.process( Function2 ); >> >> If it possible to consume in parallel by pointing at the single stream, >> is it typically slower or faster than having two kafka streams with >> different group ids? >> >> If not documented elsewhere, this would be good to cover since it is >> fundamental. >> >> Thanks, >> >> Jason >
signature.asc
Description: OpenPGP digital signature