Re: Solr indexing performance tips

2022-06-16 Thread Marius Grigaitis
I think there are or were technical reasons behind it and thats something to figure out. Its also more complicated than that, I just simplified it. E.g. uniqueKey is actually a composition of two ids and relationship between them is important for grouping purposes. I agree with you on switching to

Re: Solr indexing performance tips

2022-06-16 Thread Vincenzo D'Amore
May I ask why you haven't used the sku as (primary key)? Do you need to have more versions of the same sku? For my understanding, if you can have the sku as primary key, almost all deleteByQuery are useless. On Thu, Jun 16, 2022 at 4:38 PM Shawn Heisey wrote: > On 6/16/22 02:59, Marius Grigaitis

Re: Solr indexing performance tips

2022-06-16 Thread Shawn Heisey
On 6/16/22 02:59, Marius Grigaitis wrote: In the end what caught our eye is a few deleteByQuery lines in stacks of running threads while Solr is overloaded. We temporarily removed deleteByQuery and it had around 10x performance improvement on indexing speed. I do not understand all the low-leve

Re: Solr indexing performance tips

2022-06-16 Thread Marius Grigaitis
Hi Vincenzo, Yes. On Thu, Jun 16, 2022 at 12:39 PM Vincenzo D'Amore wrote: > Hi Marius, if I have understood correctly you have a deleteByQuery for each > document, am I right? > > On Thu, 16 Jun 2022 at 11:04, Marius Grigaitis > wrote: > > > Just a followup on the topic. > > > > * We checked

Re: Solr indexing performance tips

2022-06-16 Thread Jan Høydahl
Interesting find. I have seen other reports on very slow deleteByQuery earlier. So it should be used sparingly, and under no circumstance bombard Solr with multiple deleteByQuery requests on each update. Sounds like a better plan to switch to a truly unique ID like SKU. Or if you know the previ

Re: Solr indexing performance tips

2022-06-16 Thread Vincenzo D'Amore
Hi Marius, if I have understood correctly you have a deleteByQuery for each document, am I right? On Thu, 16 Jun 2022 at 11:04, Marius Grigaitis wrote: > Just a followup on the topic. > > * We checked settings on solr, seem quite default (especially on merge, > commit strategies, etc) > * We com

Re: Solr indexing performance tips

2022-06-16 Thread Marius Grigaitis
Just a followup on the topic. * We checked settings on solr, seem quite default (especially on merge, commit strategies, etc) * We commit every 10 minutes * Added NewRelic to the Solr instance to gather more data and graphs In the end what caught our eye is a few deleteByQuery lines in stacks of

Re: Solr indexing performance tips

2022-06-08 Thread David Hastings
> * Do NOT commit after each batch of 1000 docs. Instead, commit as seldom as your requirements allows, e.g. try commitWithin=6 to commit every minute this is the big one. commit after the entire process is done or on a timer, if you don't need NRT searching, rarely does anyone ever need that

Re: Solr indexing performance tips

2022-06-08 Thread Jan Høydahl
* Go multi threaded for each core as Shawn says. Try e.g. 2, 3 and 4 threads * Experiment with different batch sizes, e.g. try 500 and 2000 - depends on your docs what is optimal * Do NOT commit after each batch of 1000 docs. Instead, commit as seldom as your requirements allows, e.g. try commitW

Re: Solr indexing performance tips

2022-06-08 Thread Shawn Heisey
On 6/8/2022 3:35 AM, Marius Grigaitis wrote: * 9 different cores. Each weighs around ~100 MB on disk and has approximately 90k documents inside each. * Updating is performed using update method in batches of 1000, around 9 processes in parallel (split by core) This means that indexing within ea

Solr indexing performance tips

2022-06-08 Thread Marius Grigaitis
Hi All, Our Solr is bottlenecking on write performance (uses lots of cpu, writes queue up). Looking for some tips on what to look into to figure out if we can squeeze more write performance out of it without changing the setup too drastically. Here's the setup: * Solr 8.2 (I know, could be upgrad