> This will mean 2 shuffles, and 1 node might bottleneck if 1 topic has too
much data?
Yes
> Is there a way to avoid shuffle at all (or do only 1) and avoid a
situation when 1 node will become a hotspot?
Do you know the amount of data per kafka topic beforehand, or does this
have to be dynamic?
This will mean 2 shuffles, and 1 node might bottleneck if 1 topic has too
much data? Is there a way to avoid shuffle at all (or do only 1) and avoid
a situation when 1 node will become a hotspot?
Alex
On Thu, Jun 25, 2020 at 8:05 AM Kostas Kloudas wrote:
> Hi Alexander,
>
> Routing of input dat
Hi Alexander,
Routing of input data in Flink can be done through keying and this can
guarantee collocation constraints. This means that you can send two
records to the same node by giving them the same key, e.g. the topic
name. Keep in mind that elements with different keys do not
necessarily go t
Maybe I misreading the documentation, but:
"Data within the partition directories are split into part files. Each
partition will contain at least one part file for each subtask of the sink
that has received data for that partition."
So, it is 1 partition per subtask. I'm trying to figure out how t
You can achieve this in Flink 1.10 using the StreamingFileSink.
I’d also like to note that Flink 1.11 (which is currently going through
release testing and should be available imminently) has support for exactly
this functionality in the table API.
https://ci.apache.org/projects/flink/flink-docs-