Hello all, I was wondering if someone would be kind enough to enlighten me on a few topics. We are trying to join two streams of data on a key. We were thinking of partitioning topics in Kafka by the key, however I also saw that Flink is able to partition on its own and I was wondering whether Flink can take advantage of Kafka's partitioning and/or which partitioning scheme should I go for?
As far as joins, our two datasets are very large (millions of records in each window) and we need to perform the joins very quickly (less than 1 sec). Would the built in join mechanism be sufficient for this? From what I understand it would need to shuffle data and thus be slow on large datasets? I was wondering if there is a way to join via state key value lookups to avoid the shuffling? I read the docs and the blogs so far, thus have some limited understanding of how Flink works, no practical experience though. Thanks *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * <http://www.magnetic.com/>*