xushiyan commented on issue #3975:
URL: https://github.com/apache/hudi/issues/3975#issuecomment-967290309


   @dmenin A few things
   
   > GLOBAL_INDEX, which prevents data duplication, but is not scalable: as the 
amount of data grows, the load time also increases.
   
   Do you have any numbers? it'd be valuable to see how slow you're 
experiencing with it. 
   
   > In other words, if I have key 123 on partition 10 and I receive key 123 
again on partition 11, I delete the record from 10 and insert the one from 11.
   
   I think you're replicating the same logic implemented in global index with 
this flag turn on 
https://hudi.apache.org/docs/configurations/#hoodiesimpleindexupdatepartitionpath
   
   If you're to improve look up can you try HBase index?
   
   > HBASE index can be employed, if the operational overhead is acceptable and 
would provide much better lookup times for these tables.
   
   https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to