[jira] [Comment Edited] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low

Shawn Heisey (JIRA) Sun, 25 May 2014 07:02:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008341#comment-14008341
 ]


Shawn Heisey edited comment on LUCENE-5705 at 5/25/14 2:01 PM:
---------------------------------------------------------------

The javadoc changes that I made do need to change again if we don't also make 
the code changes.

I think the new javadoc need to be the following:

{code}
  /**
   * Sets the maximum number of merge threads and simultaneous merges allowed.
   * 
   * @param maxMergeCount the max # simultaneous merges that are allowed.
   *       If a merge is necessary yet we already have this many
   *       threads running, the incoming thread (that is calling
   *       add/updateDocument) will block until a merge thread
   *       has completed.  If index data is coming from a source that is
   *       sensitive to inactivity timeouts (like JDBC), it is advisable to
   *       set this value higher than default so that the incoming thread
   *       never stops.  Note that we will only run the smallest
   *       <code>maxThreadCount</code> merges at a time.
   * @param maxThreadCount the max # simultaneous merge threads that should
   *       be running at once.  This must be &lt;= <code>maxMergeCount</code>.
   *       Most setups should use the default value of 1 here.
   *       If the index is on Solid State Disk and there are
   *       plenty of CPU cores available, it is usually safe to
   *       run more threads simultaneously.
   */
{code}

I did notice the following comment in the 4x branch, but this has not been my 
experience with Solr.  Older versions seemed to prefer running the largest 
merge to completion before doing the smaller ones.  The behavior described here 
would be preferable.  If the comment is accurate, does anyone know when it 
changed?  I originally ran into my problem back on Solr 1.4.1 (Lucene 2.9), and 
I am pretty sure that some of the people I've helped on the mailing list and 
IRC were running some 4.x version, so I am not sure that this comment is 
accurate even for 4.x:

{code}
  // Max number of merge threads allowed to be running at
  // once.  When there are more merges then this, we
  // forcefully pause the larger ones, letting the smaller
  // ones run, up until maxMergeCount merges at which point
  // we forcefully pause incoming threads (that presumably
  // are the ones causing so much merging).
{code}



was (Author: elyograg):
The javadoc changes that I made do need to change again if we don't also make 
the code changes.

I think the new javadoc need to be the following:

{code}
  /**
   * Sets the maximum number of merge threads and simultaneous merges allowed.
   * 
   * @param maxMergeCount the max # simultaneous merges that are allowed.
   *       If a merge is necessary yet we already have this many
   *       threads running, the incoming thread (that is calling
   *       add/updateDocument) will block until a merge thread
   *       has completed.  If index data is coming from a source that is
   *       sensitive to inactivity timeouts (like JDBC), it is advisable to
   *       set this value higher than default so that the incoming thread
   *       never stops.  Note that we will only run the smallest
   *       <code>maxThreadCount</code> merges at a time.
   * @param maxThreadCount the max # simultaneous merge threads that should
   *       be running at once.  This must be &lt;= <code>maxMergeCount</code>.
   *       Most setups should use the default value of 1 here.
   *       If the index is on Solid State Disk and there are
   *       plenty of CPU cores available, it is usually safe to
   *       run more threads simultaneously.
   */
{code}

I did notice the following comment in the 4x branch, but this has not been my 
experience with Solr.  Older versions seemed to prefer running the largest 
merge to completion before doing the smaller ones.  The behavior described here 
would be preferable.  If the comment is accurate, does anyone know when it 
changed?  I originally ran into my problem back on Solr 1.4.1 (Lucene 2.9), and 
I am pretty sure that some of the people I've helped on the mailing list and 
IRC were running some 4.x version.

{code}
  // Max number of merge threads allowed to be running at
  // once.  When there are more merges then this, we
  // forcefully pause the larger ones, letting the smaller
  // ones run, up until maxMergeCount merges at which point
  // we forcefully pause incoming threads (that presumably
  // are the ones causing so much merging).
{code}


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This 
> causes problems for Solr's dataimport handler when very large imports are 
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, 
> the add/update thread will stop for several minutes while the largest merge 
> finishes.  In the meantime, the dataimporter JDBC connection to the database 
> will time out, and when the add/update thread resumes, the import will fail 
> because the ResultSet throws an exception.  Setting maxMergeCount to 6 
> eliminates this issue for virtually any size import -- although it is 
> theoretically possible to have that many simultaneous merge tiers, I've never 
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate 
> for most installations), I cannot think of a really good reason that the 
> default for maxMergeCount should be so low.  If someone does need to strictly 
> control the number of threads that get created, they can reduce the number.  
> Perhaps someone with more experience knows of a really good reason to make 
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid 
> bikeshedding.  I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low

Reply via email to