wqwl611 commented on PR #6636:
URL: https://github.com/apache/hudi/pull/6636#issuecomment-1256939849

   > Hey, thanks for the contribution. It is a great enhancement for bucket 
index.
   > 
   > On high-level, could we use the current BucketIndex abstraction to unify 
the implementation of different BucketIndexEngines? Also, the dedicated 
Partitioner (i.e., SparkRangeBucketIndexPartitioner) may not be necessary, as 
long as we tag the file id during indexing (checkout consistent hashing which 
uses default Partitioner).
   
        Right now, rangBucketIndex generate file like 
"00000009-0_2-12-29_20220924180225595.parquet",
   and it doesn't contain any UUID element,  I think it's ok, am I right?
   
        By this clue, if simpleBucketIndex also act like this, 
SparkBucketIndexPartitioner may not be necessary eigther?
   and if use default partitioner, it can reduce a lot of empty spark-task。
   @YuweiXiao 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to