That's fine -- distraction is pretty much what defines my work... ;)
D.
On Tue, Dec 18, 2018 at 11:19 AM Jerven Tjalling Bolleman
wrote:
>
> Hi Dawid,
>
> Thanks for looking into this! I have been distracted with other work and
> did not get the time I expected to work on it.
>
> Regards,
> Jerv
Hi Dawid,
Thanks for looking into this! I have been distracted with other work and
did not get the time I expected to work on it.
Regards,
Jerven
On 11/30/18 12:01 PM, Dawid Weiss wrote:
Just FYI: I implemented a quick and dirty PoC to see what it'd work
like. Not much of a difference on my mac
Just FYI: I implemented a quick and dirty PoC to see what it'd work
like. Not much of a difference on my machine (since postings merging
dominates everything else). Interesting problem how to split it up to
saturate all of available resources though (CPU and I/O).
https://issues.apache.org/jira/br
Do you really need exactly one segment? Or would, say, 5 be good enough?
You see where this is going, set maxsegments to 5 and maybe be able to get
some parallelization...
On Fri, Nov 2, 2018, 14:17 Dawid Weiss Thanks for chipping in, Toke. A ~1TB index is impressive.
>
> Back of the envelope say
Thanks for chipping in, Toke. A ~1TB index is impressive.
Back of the envelope says reading & writing 900GB in 8 hours is
2*900GB/(8*60*60s) = 64MB/s. I don't remember the interface for our
SSD machine, but even with SATA II this is only ~1/5th of the possible
fairly sequential IO throughput. So f
On 2018-11-02 20:52, Dawid Weiss wrote:
int processors = Runtime.getRuntime().availableProcessors();
int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler();
cms.setMaxMergesAndThreads(processors,processors);
See the number of threads in the CMS only matters if you have
concurrent mer
Dawid Weiss wrote:
> Merging segments as large as this one requires not just CPU, but also
> serious I/O throughput efficiency. I assume you have fast NVMe drives
> on that machine, otherwise it'll be slow, no matter what. It's just a
> lot of bytes going back and forth.
We have quite a lot of ex
> int processors = Runtime.getRuntime().availableProcessors();
> int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler();
> cms.setMaxMergesAndThreads(processors,processors);
See the number of threads in the CMS only matters if you have
concurrent merges of independent segments. What you
Hi Dawid, Erick,
Thanks for the reply. We are using pure lucene and currently this is
what I am doing
int processors = Runtime.getRuntime().availableProcessors();
int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler();
cms.setMaxMergesAndThreads(processors,processors);
cms.disableAu
We are faced with a similar situation. Yes, the merge process can take
a long time and is mostly single-threaded (if you're merging from N
segments into a single segment, only one thread does the job). As
Erick pointed out, the merge process takes a backseat compared to
indexing and searches (in mo
The merge process is rather tricky, and there's nothing that I know of
that will use all resources available. In fact the merge code is
written to _not_ use up all the possible resources on the theory that
there should be some left over to handle queries etc.
Yeah, the situation you describe is in
Dear Lucene Devs and Users,
First of all thank you for this wonderful library and API.
forceMerges are normally not recommended but we fall into one of the few
usecases where it makes sense.
In our use case we have a large index (3 actually) and we don't update
them ever after indexing. i.e.
12 matches
Mail list logo