[jira] [Comment Edited] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low

Shawn Heisey (JIRA) Sun, 25 May 2014 10:15:16 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008382#comment-14008382
 ]


Shawn Heisey edited comment on LUCENE-5705 at 5/25/14 5:14 PM:
---------------------------------------------------------------

I do see evidence in the infostream that I'm currently creating that merges are 
done out of order with preference to small merges.

{noformat}
IW 4 [Sun May 25 09:43:57 MDT 2014; Lucene Merge Thread #11]: merge time 47224 
msec for 563274 docs
IW 4 [Sun May 25 09:52:39 MDT 2014; Lucene Merge Thread #13]: merge time 8761 
msec for 68640 docs
IW 4 [Sun May 25 09:53:44 MDT 2014; Lucene Merge Thread #12]: merge time 266527 
msec for 4227876 docs
{noformat}

When I was having the problem I described (which was admittedly a long time 
ago, Solr 1.4.0 most likely), I was using the old default, 
LogByteSizeMergePolicy.  Would that have been using CMS, or a different 
scheduler?  When no scheduler is configured in Solr 4.x, does it choose CMS?  I 
would think that it does.

I have seen others have this problem very recently on the mailing list and IRC. 
 I'm reasonably sure that at least one of them was on a 4.x release.  Bumping 
up maxMergeCount has fixed it for those people, just like it did for me.



was (Author: elyograg):
I do see evidence in the infostream that I'm currently creating that merges are 
done out of order with preference to small merges.

IW 4 [Sun May 25 09:43:57 MDT 2014; Lucene Merge Thread #11]: merge time 47224 
msec for 563274 docs
IW 4 [Sun May 25 09:52:39 MDT 2014; Lucene Merge Thread #13]: merge time 8761 
msec for 68640 docs
IW 4 [Sun May 25 09:53:44 MDT 2014; Lucene Merge Thread #12]: merge time 266527 
msec for 4227876 docs

When I was having the problem I described (which was admittedly a long time 
ago, Solr 1.4.0 most likely), I was using the old default, 
LogByteSizeMergePolicy.  Would that have been using CMS, or a different 
scheduler?  When no scheduler is configured in Solr 4.x, does it choose CMS?  I 
would think that it does.

I have seen others have this problem very recently on the mailing list and IRC. 
 I'm reasonably sure that at least one of them was on a 4.x release.  Bumping 
up maxMergeCount has fixed it for those people, just like it did for me.


> ConcurrentMergeScheduler/maxMergeCount default is too low
> ---------------------------------------------------------
>
>                 Key: LUCENE-5705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5705
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 4.8
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5705.patch, LUCENE-5705.patch, dih-example.patch
>
>
> The default value for maxMergeCount in ConcurrentMergeScheduler is 2.  This 
> causes problems for Solr's dataimport handler when very large imports are 
> done from a JDBC source.
> What happens is that when three merge tiers are scheduled at the same time, 
> the add/update thread will stop for several minutes while the largest merge 
> finishes.  In the meantime, the dataimporter JDBC connection to the database 
> will time out, and when the add/update thread resumes, the import will fail 
> because the ResultSet throws an exception.  Setting maxMergeCount to 6 
> eliminates this issue for virtually any size import -- although it is 
> theoretically possible to have that many simultaneous merge tiers, I've never 
> seen it.
> As long as maxThreads is properly set (the default value of 1 is appropriate 
> for most installations), I cannot think of a really good reason that the 
> default for maxMergeCount should be so low.  If someone does need to strictly 
> control the number of threads that get created, they can reduce the number.  
> Perhaps someone with more experience knows of a really good reason to make 
> this default low?
> I'm not sure what the new default number should be, but I'd like to avoid 
> bikeshedding.  I don't think it should be Integer.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5705) ConcurrentMergeScheduler/maxMergeCount default is too low

Reply via email to