Re: Parallel stream consumption

Nico Kruber Tue, 16 Jan 2018 02:28:18 -0800

Just found a nice (but old) blog post that explains Flink's integration
with Kafka:
https://data-artisans.com/blog/kafka-flink-a-practical-how-to


I guess, the basics are still valid


Nico

On 16/01/18 11:17, Nico Kruber wrote:
> Hi Jason,
> I'd suggest to start with [1] and [2] for getting the basics of a Flink
> program.
> The DataStream API basically wires operators together with streams so
> that whatever stream gets out of one operator is the input of the next.
> By connecting both functions to the same Kafka stream source, your
> program results in this:
> 
> Kafka --> Function1
>       |
>       --> Function2
> 
> where both functions receive all elements the previous stream offers
> (elements are broadcasted). If you want the two functions to work on
> different elements, you could add a filter before each function:
> 
> DataStream<SomeObject> inputStream = ... got a kafka stream
> inputStream.filter(...).process( Function1 );
> inputStream.filter(...).process( Function2 );
> 
> or split the stream (see [3] for available operators).
> 
> I'm no expert on Kafka though, so I can't give you an advise on the
> performance - I'd suggest to create some small benchmarks for your setup
> since this probably depends on the cluster architecture and the
> parallelism of the operators and the number of Kafka partitions.
> Maybe Gordon (cc'd) can give some more insights.
> 
> 
> Regards
> Nico
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/programming-model.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/api_concepts.html
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html
> 
> On 12/01/18 21:57, Jason Kania wrote:
>> Hi,
>>
>> I have a question that I have not resolved via the documentation,
>> looking in the "Parallel Execution", "Streaming"  and the "Connectors"
>> sections. If I retrieve a kafka stream and then call the process
>> function against it in parallel, as follows, does it consume in some
>> round robin fashion between the two process calls or is each element
>> coming out of the kafka connector consumed by both processors in parallel?
>>
>> DataStream<SomeObject> inputStream = ... got a kafka stream
>> inputStream.process( Function1 );
>> inputStream.process( Function2 );
>>
>> If it possible to consume in parallel by pointing at the single stream,
>> is it typically slower or faster than having two kafka streams with
>> different group ids?
>>
>> If not documented elsewhere, this would be good to cover since it is
>> fundamental.
>>
>> Thanks,
>>
>> Jason
>

signature.asc
Description: OpenPGP digital signature

Re: Parallel stream consumption

Reply via email to