Michael McCandless created LUCENE-7390:
------------------------------------------

             Summary: Let BKDWriter use temp heap for sorting points in 
proportion to IndexWriter's indexing buffer
                 Key: LUCENE-7390
                 URL: https://issues.apache.org/jira/browse/LUCENE-7390
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
             Fix For: master (7.0), 6.2


With Lucene's default codec, when writing dimensional points, we only give 
{{BKDWriter}} 16 MB heap to use for sorting, regardless of how large IW's 
indexing buffer is.  A custom codec can change this but that's a little steep.

I've been testing indexing performance on a points-heavy dataset, 1.2 billion 
taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml , 
indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear 
performance problems because flushing the large segments forces {{BKDwriter}} 
to switch to offline sorting which causes the DWPTs take too long to flush.  
They then fall behind, and Lucene does a hard stall on incoming indexing 
threads until they catch up.

[~rcmuir] had a simple idea to let IW pass the allowed temp heap usage to 
{{PointsWriter.writeField}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to