Re: Static index, fastest way to do forceMerge

2018-12-18 Thread Dawid Weiss
That's fine -- distraction is pretty much what defines my work... ;) D. On Tue, Dec 18, 2018 at 11:19 AM Jerven Tjalling Bolleman wrote: > > Hi Dawid, > > Thanks for looking into this! I have been distracted with other work and > did not get the time I expected to work on it. > > Regards, > Jerv

Re: Static index, fastest way to do forceMerge

2018-12-18 Thread Jerven Tjalling Bolleman
Hi Dawid, Thanks for looking into this! I have been distracted with other work and did not get the time I expected to work on it. Regards, Jerven On 11/30/18 12:01 PM, Dawid Weiss wrote: Just FYI: I implemented a quick and dirty PoC to see what it'd work like. Not much of a difference on my mac

Re: Static index, fastest way to do forceMerge

2018-11-30 Thread Dawid Weiss
Just FYI: I implemented a quick and dirty PoC to see what it'd work like. Not much of a difference on my machine (since postings merging dominates everything else). Interesting problem how to split it up to saturate all of available resources though (CPU and I/O). https://issues.apache.org/jira/br

Re: Static index, fastest way to do forceMerge

2018-11-03 Thread Erick Erickson
Do you really need exactly one segment? Or would, say, 5 be good enough? You see where this is going, set maxsegments to 5 and maybe be able to get some parallelization... On Fri, Nov 2, 2018, 14:17 Dawid Weiss Thanks for chipping in, Toke. A ~1TB index is impressive. > > Back of the envelope say

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Dawid Weiss
Thanks for chipping in, Toke. A ~1TB index is impressive. Back of the envelope says reading & writing 900GB in 8 hours is 2*900GB/(8*60*60s) = 64MB/s. I don't remember the interface for our SSD machine, but even with SATA II this is only ~1/5th of the possible fairly sequential IO throughput. So f

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Jerven Tjalling Bolleman
On 2018-11-02 20:52, Dawid Weiss wrote: int processors = Runtime.getRuntime().availableProcessors(); int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler(); cms.setMaxMergesAndThreads(processors,processors); See the number of threads in the CMS only matters if you have concurrent mer

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Toke Eskildsen
Dawid Weiss wrote: > Merging segments as large as this one requires not just CPU, but also > serious I/O throughput efficiency. I assume you have fast NVMe drives > on that machine, otherwise it'll be slow, no matter what. It's just a > lot of bytes going back and forth. We have quite a lot of ex

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Dawid Weiss
> int processors = Runtime.getRuntime().availableProcessors(); > int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler(); > cms.setMaxMergesAndThreads(processors,processors); See the number of threads in the CMS only matters if you have concurrent merges of independent segments. What you

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Jerven Tjalling Bolleman
Hi Dawid, Erick, Thanks for the reply. We are using pure lucene and currently this is what I am doing int processors = Runtime.getRuntime().availableProcessors(); int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler(); cms.setMaxMergesAndThreads(processors,processors); cms.disableAu

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Dawid Weiss
We are faced with a similar situation. Yes, the merge process can take a long time and is mostly single-threaded (if you're merging from N segments into a single segment, only one thread does the job). As Erick pointed out, the merge process takes a backseat compared to indexing and searches (in mo

Re: Static index, fastest way to do forceMerge

2018-11-02 Thread Erick Erickson
The merge process is rather tricky, and there's nothing that I know of that will use all resources available. In fact the merge code is written to _not_ use up all the possible resources on the theory that there should be some left over to handle queries etc. Yeah, the situation you describe is in

Static index, fastest way to do forceMerge

2018-11-02 Thread Jerven Bolleman
Dear Lucene Devs and Users, First of all thank you for this wonderful library and API. forceMerges are normally not recommended but we fall into one of the few usecases where it makes sense. In our use case we have a large index (3 actually) and we don't update them ever after indexing. i.e.