[ 
https://issues.apache.org/jira/browse/KUDU-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1439:
------------------------------
    Labels: performance  (was: )

> Optimization for batch inserts into empty key ranges
> ----------------------------------------------------
>
>                 Key: KUDU-1439
>                 URL: https://issues.apache.org/jira/browse/KUDU-1439
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, tablet
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>              Labels: performance
>
> Got this idea from a CockroachDB optimization:
> https://github.com/cockroachdb/cockroach/pull/6375
> The short version is that if we have a moderately large batch of inserts 
> which are sorted, we can do the following pseudocode:
> - sort the inserts by primary key
> - instead of using bloom filter checks, use SeekAtOrAfter on the first 
> primary key in the batch. This yields the next higher primary key that might 
> exist in the table (_nextKey_).
> - for each of the keys in the sorted batch, if it's less than _nextKey_, we 
> don't need to do an existence check for it.
> In the common case where clients are writing non-overlapping batches of rows 
> (eg importing from parquet) this should reduce the number of seeks and bloom 
> checks dramatically (order of batch size). Plus, it doesn't require much new 
> code to be written, so worth a shot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to