subject:"Re\: Loading a Hive lookup table data into TM memory takes so long."

Re: Loading a Hive lookup table data into TM memory takes so long.

2022-02-07 Thread Caizhi Weng

Hi! If Flink is not happy with a large Hive table data Currently it is. Hive lookup table (currently implemented just like a filesystem lookup table) cannot look up values with a specific key, so it has to load all data into memory. Did you mean putting the Hive table data into a Kafka/Kinesis a

Re: Loading a Hive lookup table data into TM memory takes so long.

2022-02-07 Thread Jason Yi

Hi Caizhi, Could you tell me more details about streaming joins that you suggested? Did you mean putting the Hive table data into a Kafka/Kinesis and joining the main stream with the hive table data streaming with a very long watermark? In my use case, the hive table is an account dimension table

Re: Loading a Hive lookup table data into TM memory takes so long.

2022-02-06 Thread Caizhi Weng

Hi! Each parallelism of the lookup operation will load all data from the lookup table source, so you're loading 10GB of data to each parallelism and storing them in JVM memory. That is not only slow but also very memory-consuming. Have you tried joining your main stream with the hive table direct