I think there are or were technical reasons behind it and thats something
to figure out. Its also more complicated than that, I just simplified it.
E.g. uniqueKey is actually a composition of two ids and relationship
between them is important for grouping purposes.
I agree with you on switching to
May I ask why you haven't used the sku as (primary key)? Do you need to
have more versions of the same sku?
For my understanding, if you can have the sku as primary key, almost all
deleteByQuery are useless.
On Thu, Jun 16, 2022 at 4:38 PM Shawn Heisey wrote:
> On 6/16/22 02:59, Marius Grigaitis
On 6/16/22 02:59, Marius Grigaitis wrote:
In the end what caught our eye is a few deleteByQuery lines in stacks of
running threads while Solr is overloaded. We temporarily removed
deleteByQuery and it had around 10x performance improvement on indexing
speed.
I do not understand all the low-leve
Hi Vincenzo,
Yes.
On Thu, Jun 16, 2022 at 12:39 PM Vincenzo D'Amore
wrote:
> Hi Marius, if I have understood correctly you have a deleteByQuery for each
> document, am I right?
>
> On Thu, 16 Jun 2022 at 11:04, Marius Grigaitis
> wrote:
>
> > Just a followup on the topic.
> >
> > * We checked
Interesting find. I have seen other reports on very slow deleteByQuery earlier.
So it should be used sparingly, and under no circumstance bombard Solr with
multiple deleteByQuery requests on each update.
Sounds like a better plan to switch to a truly unique ID like SKU. Or if you
know the previ
Hi Marius, if I have understood correctly you have a deleteByQuery for each
document, am I right?
On Thu, 16 Jun 2022 at 11:04, Marius Grigaitis
wrote:
> Just a followup on the topic.
>
> * We checked settings on solr, seem quite default (especially on merge,
> commit strategies, etc)
> * We com
Just a followup on the topic.
* We checked settings on solr, seem quite default (especially on merge,
commit strategies, etc)
* We commit every 10 minutes
* Added NewRelic to the Solr instance to gather more data and graphs
In the end what caught our eye is a few deleteByQuery lines in stacks of
> * Do NOT commit after each batch of 1000 docs. Instead, commit as seldom
as your requirements allows, e.g. try commitWithin=6 to commit every
minute
this is the big one. commit after the entire process is done or on a
timer, if you don't need NRT searching, rarely does anyone ever need that
* Go multi threaded for each core as Shawn says. Try e.g. 2, 3 and 4 threads
* Experiment with different batch sizes, e.g. try 500 and 2000 - depends on
your docs what is optimal
* Do NOT commit after each batch of 1000 docs. Instead, commit as seldom as
your requirements allows, e.g. try commitW
On 6/8/2022 3:35 AM, Marius Grigaitis wrote:
* 9 different cores. Each weighs around ~100 MB on disk and has
approximately 90k documents inside each.
* Updating is performed using update method in batches of 1000, around 9
processes in parallel (split by core)
This means that indexing within ea
Hi All,
Our Solr is bottlenecking on write performance (uses lots of cpu, writes
queue up). Looking for some tips on what to look into to figure out if we
can squeeze more write performance out of it without changing the setup too
drastically.
Here's the setup:
* Solr 8.2 (I know, could be upgrad
11 matches
Mail list logo