[ https://issues.apache.org/jira/browse/HUDI-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Y Ethan Guo updated HUDI-9164: ------------------------------ Issue Type: Improvement (was: Bug) > Partition strategy of MDT secondary index does not match the query pattern it > serves > ------------------------------------------------------------------------------------ > > Key: HUDI-9164 > URL: https://issues.apache.org/jira/browse/HUDI-9164 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Davis Zhang > Priority: Blocker > Fix For: 1.1.0 > > > h3. MDT sec idx file layout does not favor lookup/join efficiently > Regarding the MDT join with an incoming pruning set RDD[Internal Row], the > existing secondary index data layout does not favor batch prefix look up. > > h4. MDT index layout > MDT secondary index are using key value pair, where the key uses scheme > <data column value><separator><record key value> > and value is the file group id. > > So you can see all records comes with the prefix of the column value. > > It adopts {*}hash based partitioning{*}, which means it takes Full key <data > col value><record key value>, hash it and decide which file group the > partition belongs to. > > h3. The query pattern it serves > > In a nutshell, the data layout is hash partitioning while the query pattern > is prefix lookup, this 2 does not match at all. > > 2 types of query pattern against the index: * point look up given only a > secondary index column value, meaning only {{<data column value>}} is given > and we need to look up all file group ids associated > * Join with a large amount of column value: this is how secondary index join > would work. When joining tableGeneratingPruningSet and > tableWithIdxTobePruned, the tableGeneratingPruningSet generates a RDD of > values for data column C1, we use this RDD joining with MDT C1 secondary > index to figure out file group ids of interest. Here we are looking at join > between this RDD and MDT at a large scale. > > Because we only knew the {{<data column value>}} from the input, which is > only the prefix of the secondary index key, so we don't know which bucket the > potential MDT records belongs to. As a result, even for point look up we need > to load the full MDT and the complexity is O(n). > > This is not scalable . > > Needs a improvements on the partition scheme to handle prefix based search at > a large scale. -- This message was sent by Atlassian Jira (v8.20.10#820010)