Hi, Bright. 

Thanks for reaching out. That's a really good question. 
Briefly speaking, the reason is both Hudi and iceberg are not efficient for 
updating. 
Also, the FLIP for flink-table-store has said why not hudi [1]: 

" 
Why doesn't FileStore use Hudi directly? 

1: Hudi aims to support the update of upsert, so it needs to forcibly define 
the primary key and time column. It is not easy to support all changelog types 
2: The update of Hudi is based on the index (currently there are BloomFilter 
and HBase). The data in the bucket is out of order. Every merge needs to be 
reread and rewritten, which is expensive. We need fast update storage, LSM is 
more suitable. 
" 

Also I have add JingSong Li to the mail list. He is the creator/maintainer of 
flink-table-store. Maybe he can provide more detail. 

[1] [ 
https://cwiki.apache.org/confluence/display/Flink/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage#FLIP188:IntroduceBuiltinDynamicTableStorage-UsingHudi
 | 
https://cwiki.apache.org/confluence/display/Flink/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage#FLIP188:IntroduceBuiltinDynamicTableStorage-UsingHudi
 ] 


Best regards, 
Yuxia 


发件人: "graceking lau" <graceking...@gmail.com> 
收件人: "User" <user@flink.apache.org> 
发送时间: 星期一, 2023年 2 月 06日 上午 9:24:31 
主题: Design decisions around flink table store 

Hi there, 

Recently I had a chance to get to know the flink-table-store project. I was 
attracted by the idea behind it at first glance. 

After reading the docs, I've got a question in my head for a while. It's about 
the design of the file storage. 

It looks like we can implement it based on the other popular open-source 
libraries other than creating a totally new component (lsm tree based). Hudi or 
iceburg looks like a good choice, since they both support change logs saving 
and querying. 
If we do it like this, there is no need to create a component for other related 
computation engines (spark, hive or trinno) since they are already supported by 
hudi or iceburg. It looks like a better solution for me instead of creating 
another wheel. 

So, here are my questions. Is there any issue not to write data as hudi or 
iceburg? Why don't we choose them in the first design decision? 

Looking forward to your answer! 

(Not knowing if it's a good way to ask questions here, but I didn't find 
another way yet. If it's not ok to ask in the mail, could someone please point 
the right direction for me?) 

Best regards, 
Bright. 

Reply via email to