Hi!
If Flink is not happy with a large Hive table data
Currently it is. Hive lookup table (currently implemented just like a
filesystem lookup table) cannot look up values with a specific key, so it
has to load all data into memory.
Did you mean putting the Hive table data into a Kafka/Kinesis a
Hi Caizhi,
Could you tell me more details about streaming joins that you suggested?
Did you mean putting the Hive table data into a Kafka/Kinesis and joining
the main stream with the hive table data streaming with a very long
watermark?
In my use case, the hive table is an account dimension table
Hi!
Each parallelism of the lookup operation will load all data from the lookup
table source, so you're loading 10GB of data to each parallelism and
storing them in JVM memory. That is not only slow but also very
memory-consuming.
Have you tried joining your main stream with the hive table direct
Hello,
I created external tables on Hive with data in s3 and wanted to use those
tables as a lookup table in Flink.
When I used an external table containing a small size of data as a lookup
table, Flink quickly loaded the data into TM memory and did a Temporal join
to an event stream. But, when I