[ https://issues.apache.org/jira/browse/KUDU-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke updated KUDU-1439: ------------------------------ Labels: performance (was: ) > Optimization for batch inserts into empty key ranges > ---------------------------------------------------- > > Key: KUDU-1439 > URL: https://issues.apache.org/jira/browse/KUDU-1439 > Project: Kudu > Issue Type: Improvement > Components: perf, tablet > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Major > Labels: performance > > Got this idea from a CockroachDB optimization: > https://github.com/cockroachdb/cockroach/pull/6375 > The short version is that if we have a moderately large batch of inserts > which are sorted, we can do the following pseudocode: > - sort the inserts by primary key > - instead of using bloom filter checks, use SeekAtOrAfter on the first > primary key in the batch. This yields the next higher primary key that might > exist in the table (_nextKey_). > - for each of the keys in the sorted batch, if it's less than _nextKey_, we > don't need to do an existence check for it. > In the common case where clients are writing non-overlapping batches of rows > (eg importing from parquet) this should reduce the number of seeks and bloom > checks dramatically (order of batch size). Plus, it doesn't require much new > code to be written, so worth a shot. -- This message was sent by Atlassian Jira (v8.3.4#803005)