Hello all,

I was wondering if someone would be kind enough to enlighten me on a few
topics. We are trying to join two streams of data on a key. We were
thinking of partitioning topics in Kafka by the key, however I also saw
that Flink is able to partition on its own and I was wondering whether
Flink can take advantage of Kafka's partitioning and/or which partitioning
scheme should I go for?

As far as joins, our two datasets are very large (millions of records in
each window) and we need to perform the joins very quickly (less than 1
sec). Would the built in join mechanism be sufficient for this? From what I
understand it would need to shuffle data and thus be slow on large
datasets? I was wondering if there is a way to join via state key value
lookups to avoid the shuffling?

I read the docs and the blogs so far, thus have some limited understanding
of how Flink works, no practical experience though.

Thanks


*Alex Rovner*
*Director, Data Engineering *
*o:* 646.759.0052

* <http://www.magnetic.com/>*

Reply via email to