Re: Big difference in response time on solr 8.7; Optimized vs un Optimized core

2021-08-11 Thread Satya Nand
Thanks, Deepak, We will try doing this.

But still, I am wondering what led to the increase in response time this
much from solr 6.5 to solr 8.7, keeping everything same.
We are facing an increase of 100-150 ms.


On Wed, Aug 11, 2021 at 11:55 AM Deepak Goel  wrote:

> If I were you, then I would stick to the 128 GB machine. And then look at
> other parameters to tune...
>
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are treated
> - Mahatma Gandhi"
>
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Tue, Aug 3, 2021 at 3:25 PM Satya Nand  .invalid>
> wrote:
>
> > Hi Deepak,
> >
> > We actually tried with 128 GB machine, Didn't help in response time. So
> we
> > moved back to 96GB.
> >
> > On Tue, Aug 3, 2021 at 2:11 PM Deepak Goel  wrote:
> >
> > > I am confused a bit about the maths:
> > >
> > > Heap-30 GB & Index Size-95 GB is equal to 125GB. And the RAM is 96GB.
> > >
> > >
> > >
> > >
> > > Deepak
> > > "The greatness of a nation can be judged by the way its animals are
> > treated
> > > - Mahatma Gandhi"
> > >
> > > +91 73500 12833
> > > deic...@gmail.com
> > >
> > > Facebook: https://www.facebook.com/deicool
> > > LinkedIn: www.linkedin.com/in/deicool
> > >
> > > "Plant a Tree, Go Green"
> > >
> > > Make In India : http://www.makeinindia.com/home
> > >
> > >
> > > On Tue, Aug 3, 2021 at 12:10 PM Satya Nand  > > .invalid>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have recently upgraded solr from version 6.5 to version 8.7.  But
> > > > opposite to our expectations, the response time increased by 40 %.
> > > >
> > > > On solr 8.7 the difference between optimized and unoptimized index is
> > > also
> > > > very huge. 350 ms on optimized and 650 ms on unoptimized. The
> > difference
> > > is
> > > > only 5 GB in size in cores of optimized and unoptimized. The segment
> > > count
> > > > in the optimized index is 1 and 20 in the unoptimized index.
> > > >
> > > > I wanted to ask, Is this normal behavior on solr 8.7, or was there
> some
> > > > setting that we forgot to add? Pleas also tell us how can we reduce
> the
> > > > response time in unoptimzed core.
> > > >
> > > > *Specifications*
> > > > We are using master slave architecture, Polling interval is 3 hours
> > > > RAM- 96 GB
> > > > CPU-14
> > > > Heap-30 GB
> > > > Index Size-95 GB
> > > > Segments size-20
> > > > Merge Policy :
> > > >
> > > >  > > class="org.apache.solr.index.TieredMergePolicyFactory">
> > > > 5  name="segmentsPerTier">3
> >  > > > mergePolicyFactory>
> > > >
> > > > --
> > > >
> > > >
> > >
> >
> > --
> >
> >
>

-- 



Re: Big difference in response time on solr 8.7; Optimized vs un Optimized core

2021-08-11 Thread Deepak Goel
You will have to elaborate a bit on: "keeping everything same"


Deepak
"The greatness of a nation can be judged by the way its animals are treated
- Mahatma Gandhi"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Wed, Aug 11, 2021 at 12:38 PM Satya Nand
 wrote:

> Thanks, Deepak, We will try doing this.
>
> But still, I am wondering what led to the increase in response time this
> much from solr 6.5 to solr 8.7, keeping everything same.
> We are facing an increase of 100-150 ms.
>
>
> On Wed, Aug 11, 2021 at 11:55 AM Deepak Goel  wrote:
>
> > If I were you, then I would stick to the 128 GB machine. And then look at
> > other parameters to tune...
> >
> >
> > Deepak
> > "The greatness of a nation can be judged by the way its animals are
> treated
> > - Mahatma Gandhi"
> >
> > +91 73500 12833
> > deic...@gmail.com
> >
> > Facebook: https://www.facebook.com/deicool
> > LinkedIn: www.linkedin.com/in/deicool
> >
> > "Plant a Tree, Go Green"
> >
> > Make In India : http://www.makeinindia.com/home
> >
> >
> > On Tue, Aug 3, 2021 at 3:25 PM Satya Nand  > .invalid>
> > wrote:
> >
> > > Hi Deepak,
> > >
> > > We actually tried with 128 GB machine, Didn't help in response time. So
> > we
> > > moved back to 96GB.
> > >
> > > On Tue, Aug 3, 2021 at 2:11 PM Deepak Goel  wrote:
> > >
> > > > I am confused a bit about the maths:
> > > >
> > > > Heap-30 GB & Index Size-95 GB is equal to 125GB. And the RAM is 96GB.
> > > >
> > > >
> > > >
> > > >
> > > > Deepak
> > > > "The greatness of a nation can be judged by the way its animals are
> > > treated
> > > > - Mahatma Gandhi"
> > > >
> > > > +91 73500 12833
> > > > deic...@gmail.com
> > > >
> > > > Facebook: https://www.facebook.com/deicool
> > > > LinkedIn: www.linkedin.com/in/deicool
> > > >
> > > > "Plant a Tree, Go Green"
> > > >
> > > > Make In India : http://www.makeinindia.com/home
> > > >
> > > >
> > > > On Tue, Aug 3, 2021 at 12:10 PM Satya Nand  > > > .invalid>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We have recently upgraded solr from version 6.5 to version 8.7.
> But
> > > > > opposite to our expectations, the response time increased by 40 %.
> > > > >
> > > > > On solr 8.7 the difference between optimized and unoptimized index
> is
> > > > also
> > > > > very huge. 350 ms on optimized and 650 ms on unoptimized. The
> > > difference
> > > > is
> > > > > only 5 GB in size in cores of optimized and unoptimized. The
> segment
> > > > count
> > > > > in the optimized index is 1 and 20 in the unoptimized index.
> > > > >
> > > > > I wanted to ask, Is this normal behavior on solr 8.7, or was there
> > some
> > > > > setting that we forgot to add? Pleas also tell us how can we reduce
> > the
> > > > > response time in unoptimzed core.
> > > > >
> > > > > *Specifications*
> > > > > We are using master slave architecture, Polling interval is 3 hours
> > > > > RAM- 96 GB
> > > > > CPU-14
> > > > > Heap-30 GB
> > > > > Index Size-95 GB
> > > > > Segments size-20
> > > > > Merge Policy :
> > > > >
> > > > >  > > > class="org.apache.solr.index.TieredMergePolicyFactory">
> > > > > 5  > name="segmentsPerTier">3
> > >  > > > > mergePolicyFactory>
> > > > >
> > > > > --
> > > > >
> > > > >
> > > >
> > >
> > > --
> > >
> > >
> >
>
> --
>
>


Re: Big difference in response time on solr 8.7; Optimized vs un Optimized core

2021-08-11 Thread Satya Nand
Hi Deepak,
Heap Size
index size(Almost Same)
schema
solr config

We have only done one change, Earlier we using synonym_edismax parser. As
this parser was not available for solr 8.7 we replaced it with edismax +
synonym graph filter factory to handle multiword synonym.

Also, On solr 8.7 the difference between optimized and unoptimized index is
also very huge. 180+ ms on optimized and 350+ ms on unoptimized. The
difference is only 5 GB in size in cores of optimized and unoptimized. The
segment count in the optimized index is 1 and 20 in the unoptimized index.

On Wed, Aug 11, 2021 at 12:39 PM Deepak Goel  wrote:

> You will have to elaborate a bit on: "keeping everything same"
>
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are treated
> - Mahatma Gandhi"
>
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Wed, Aug 11, 2021 at 12:38 PM Satya Nand
>  wrote:
>
> > Thanks, Deepak, We will try doing this.
> >
> > But still, I am wondering what led to the increase in response time this
> > much from solr 6.5 to solr 8.7, keeping everything same.
> > We are facing an increase of 100-150 ms.
> >
> >
> > On Wed, Aug 11, 2021 at 11:55 AM Deepak Goel  wrote:
> >
> > > If I were you, then I would stick to the 128 GB machine. And then look
> at
> > > other parameters to tune...
> > >
> > >
> > > Deepak
> > > "The greatness of a nation can be judged by the way its animals are
> > treated
> > > - Mahatma Gandhi"
> > >
> > > +91 73500 12833
> > > deic...@gmail.com
> > >
> > > Facebook: https://www.facebook.com/deicool
> > > LinkedIn: www.linkedin.com/in/deicool
> > >
> > > "Plant a Tree, Go Green"
> > >
> > > Make In India : http://www.makeinindia.com/home
> > >
> > >
> > > On Tue, Aug 3, 2021 at 3:25 PM Satya Nand  > > .invalid>
> > > wrote:
> > >
> > > > Hi Deepak,
> > > >
> > > > We actually tried with 128 GB machine, Didn't help in response time.
> So
> > > we
> > > > moved back to 96GB.
> > > >
> > > > On Tue, Aug 3, 2021 at 2:11 PM Deepak Goel 
> wrote:
> > > >
> > > > > I am confused a bit about the maths:
> > > > >
> > > > > Heap-30 GB & Index Size-95 GB is equal to 125GB. And the RAM is
> 96GB.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Deepak
> > > > > "The greatness of a nation can be judged by the way its animals are
> > > > treated
> > > > > - Mahatma Gandhi"
> > > > >
> > > > > +91 73500 12833
> > > > > deic...@gmail.com
> > > > >
> > > > > Facebook: https://www.facebook.com/deicool
> > > > > LinkedIn: www.linkedin.com/in/deicool
> > > > >
> > > > > "Plant a Tree, Go Green"
> > > > >
> > > > > Make In India : http://www.makeinindia.com/home
> > > > >
> > > > >
> > > > > On Tue, Aug 3, 2021 at 12:10 PM Satya Nand <
> satya.n...@indiamart.com
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We have recently upgraded solr from version 6.5 to version 8.7.
> > But
> > > > > > opposite to our expectations, the response time increased by 40
> %.
> > > > > >
> > > > > > On solr 8.7 the difference between optimized and unoptimized
> index
> > is
> > > > > also
> > > > > > very huge. 350 ms on optimized and 650 ms on unoptimized. The
> > > > difference
> > > > > is
> > > > > > only 5 GB in size in cores of optimized and unoptimized. The
> > segment
> > > > > count
> > > > > > in the optimized index is 1 and 20 in the unoptimized index.
> > > > > >
> > > > > > I wanted to ask, Is this normal behavior on solr 8.7, or was
> there
> > > some
> > > > > > setting that we forgot to add? Pleas also tell us how can we
> reduce
> > > the
> > > > > > response time in unoptimzed core.
> > > > > >
> > > > > > *Specifications*
> > > > > > We are using master slave architecture, Polling interval is 3
> hours
> > > > > > RAM- 96 GB
> > > > > > CPU-14
> > > > > > Heap-30 GB
> > > > > > Index Size-95 GB
> > > > > > Segments size-20
> > > > > > Merge Policy :
> > > > > >
> > > > > >  > > > > class="org.apache.solr.index.TieredMergePolicyFactory">
> > > > > > 5  > > name="segmentsPerTier">3
> > > >  > > > > > mergePolicyFactory>
> > > > > >
> > > > > > --
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > >
> > >
> >
> > --
> >
> >
>

-- 



Duplicate sample errors using prometheus-exporter in Solr 8.9.0

2021-08-11 Thread Joshua Hendrickson
Hello,

Our organization has implemented Solr 8.9.0 for a production use case. We have 
standardized on Prometheus for metrics collection and storage. We export 
metrics from our Solr cluster by deploying the public Solr image for version 
8.9.0 to an EC2 instance and using Docker to run the exporter binary against 
Solr (which is running in a container on the same host). Our Prometheus scraper 
(hosted in Kubernetes and configured via a Helm chart) reports errors like the 
following on every scrape:

ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote level=error 
remote_name=11d3d0 url=https://our.endpoint/push msg="non-recoverable error" 
count=500 err="server returned HTTP status 400 Bad Request: user=n: err: 
duplicate sample for timestamp. timestamp=2021-08-10T16:44:13.317Z, 
series={__name__=\"solr_metrics_core_time_seconds_total\", 
aws_account=\"our-account\", 
base_url=\"http://fqdn.for.solr.server:32080/solr\";, category=\"QUERY\", 
cluster=\"our-cluster\", collection=\"a-collection\", 
core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\", 
instance=\" fqdn.for.solr.server:8984\", job=\"solr\", replica=\"replica_t13\", 
shard=\"shard1\"}"

We have confirmed that there are indeed duplicate time series when we query our 
promtheus exporter. Here is a sample that shows the duplicate time series:

solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",}
 1.533471301599E9
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",}
 8.89078653472891E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",}
 8.9061212477449E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",}
 1.63796914645E9
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",}
 9.05314998357273E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",}
 9.06952967503723E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",}
 1.667842814432E9
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",}
 9.1289401347629E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",}
 9.14561856290722E11

This is the systemd unit file that runs the exporter container:

[Unit]
Description=Solr Exporter Docker
After=network.target
Wants=network.target
Requires=docker.service
After=docker.service

[Service]
Type=simple
ExecStart=/usr/bin/docker run --rm \
--name=solr-exporter \
--net=host \
--user=solr \
solr:8.9.0 \
/opt/solr/contrib/prometheus-exporter/bin/solr-exporter \
-p 8984 -z the-various-zookeeper-endpoints -f 
/opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml -n 4

ExecStop=/usr/bin/docker stop -t 2 solr-exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

I looked into the XML configurations for prometheus-exporter between 8.6.2 (the 
previous version we used) and latest, and it looks like at some point recently 
there was a major refactoring in how this works. Is there something we are 
missing? Can anyone reproduce this issue on 8.9?

Thanks in advance,
Joshua Hendrickson 



Handling tokenization of Chinese + Latin words

2021-08-11 Thread Ricardo Soto Estévez
Good day,
and first of all, it's a pleasure to join you all. In my workplace a new
interesting dilemma appeared that we have been looking up and discussing
the last few days and I thought that you could be a good place to extend
our research on this topic.

Let's head to the heart of this. Some days ago a Chinese customer did open
a ticket into our place about not getting results from Chinese phrases like
the following: 短袖V領上衣, 條紋印花口袋T恤.
As you see, hidden between the ideograms we have V and T and it's not
incorrect as V領 stands for V-neck and T恤 stands for T-shirt. The problem is
that currently our pipeline of tokenizers and filters divide those into T
and 恤, provoking that we don't generate the correct matches and as so those
constructs are unsearchable.

I have been giving a look to the source code and documentation of all the
classes related to the tokenizer and found about the rulefiles. A little
glimpse to the specification of those looks exactly like what we need but I
would like to know what you think. Also, has anyone else come into this
problem before? I wouldn't be surprised if it was the case, can you share
the approach that you followed or the rulefile that you used if you did it
like this? Or any rulefile that you know about that matches our needings?

Thank you all

-- 
*Ricardo Soto Estévez* 
BACKEND ENGINEER
[image: Empathy Logo]
Privacy Policy 


Re: Duplicate sample errors using prometheus-exporter in Solr 8.9.0

2021-08-11 Thread Mathieu Marie
It happens because you use *-z zk-url *to connect to solr.
When you do that the prometheus-export assumes that it connects to a
SolrCloud environment and will collect the metrics from all nodes.
Given you have started 3 prometheus-exporters, each one of them will
collect all metrics from the cluster.

You can fix this in two different ways:
1- use *-h * instead of *-z *
2- have only one instance of the prometheus-exporter in the cluster

Note that solution 1 will not retrieve the metrics you have configured in
the ** tag in your configuration, as *-h* assumes a non-solr
cloud instance.

Regards,
Mathieu

On Wed, Aug 11, 2021 at 9:32 AM Joshua Hendrickson <
jhendrick...@tripadvisor.com> wrote:

> Hello,
>
> Our organization has implemented Solr 8.9.0 for a production use case. We
> have standardized on Prometheus for metrics collection and storage. We
> export metrics from our Solr cluster by deploying the public Solr image for
> version 8.9.0 to an EC2 instance and using Docker to run the exporter
> binary against Solr (which is running in a container on the same host). Our
> Prometheus scraper (hosted in Kubernetes and configured via a Helm chart)
> reports errors like the following on every scrape:
>
> ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote
> level=error remote_name=11d3d0 url=https://our.endpoint/push
> msg="non-recoverable error" count=500 err="server returned HTTP status 400
> Bad Request: user=n: err: duplicate sample for timestamp.
> timestamp=2021-08-10T16:44:13.317Z,
> series={__name__=\"solr_metrics_core_time_seconds_total\",
> aws_account=\"our-account\", base_url=\"
> http://fqdn.for.solr.server:32080/solr\";, category=\"QUERY\",
> cluster=\"our-cluster\", collection=\"a-collection\",
> core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\",
> instance=\" fqdn.for.solr.server:8984\", job=\"solr\",
> replica=\"replica_t13\", shard=\"shard1\"}"
>
> We have confirmed that there are indeed duplicate time series when we
> query our promtheus exporter. Here is a sample that shows the duplicate
> time series:
>
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="
> http://fqdn3.for.solr.server:32080/solr",} 1.533471301599E9
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="
> http://fqdn3.for.solr.server:32080/solr",} 8.89078653472891E11
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="
> http://fqdn3.for.solr.server:32080/solr",} 8.9061212477449E11
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="
> http://fqdn2.for.solr.server:32080/solr",} 1.63796914645E9
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="
> http://fqdn2.for.solr.server:32080/solr",} 9.05314998357273E11
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="
> http://fqdn2.for.solr.server:32080/solr",} 9.06952967503723E11
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="
> http://fqdn1.for.solr.server:32080/solr",} 1.667842814432E9
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="
> http://fqdn1.for.solr.server:32080/solr",} 9.1289401347629E11
>
> solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="
> http://fqdn1.for.solr.server:32080/solr",} 9.14561856290722E11
>
> This is the systemd unit file that runs the exporter container:
>
> [Unit]
> Description=Solr Exporter Docker
> After=network.target
> Wants=network.target
> Requires=docker.service
> After=docker.service
>
> [Service]
> Type=simple
> ExecStart=/usr/bin/docker run --rm \
> --name=solr-exporter \
> --net=host \
> --user=solr \
> solr:8.9.0 \
> /opt/solr/contrib/prometheus-exporter/bin/solr-exporter \
> -p 8984 -z the-various-zookeeper-endpoints -f
> /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml -n 4
>
> ExecStop=/usr/bin/docker stop -t 2 solr-exporter
> Restart=on-failure
>
> [Install]
> WantedBy=multi-user.t

Re: OutofMemory Error in solr 6.5

2021-08-11 Thread Shawn Heisey

On 8/10/2021 11:17 PM, Satya Nand wrote:
Thanks for explaining it so well. We will work on reducing the filter 
cache size and auto warm count.


Though I have one question.

If your configured 4000 entry filterCache were to actually fill up, it
would require nearly 51 billion bytes, and that's just for the one
core
with 101 million documents.  This is much larger than the 30GB
heap you
have specified ... I am betting that the filterCache is the reason
you're hitting OOME.


As you can see from the below screenshots the filter cache is almost 
full and the heap is approx 18-20 GB. I think this means heap is not 
actually taking 51 GB of space. Otherwise, the issue would have been 
very frequent if the full cache had been taking ~50 GB of space. I 
also believed the solr uses some compressed data structures to 
accumulate its cache, That' how it is able to store the cache in less 
memory. Isn't it?


Also, the issue is not very frequent. It comes once or twice a month, 
Where all follower servers stop working at the same time due to 
OutOfMemory error.


We can't see any of the images.  The mailing list software stripped 
them.  Most attachments do not come through -- you'll need to find a 
file sharing website and give us links.  Dropbox is a good choice, and 
there are others.


The cache may not be getting full, but each entry is over 12 megabytes 
in size, so it will not need to be full to cause problems.  It does not 
get compressed.  Solr (actually Lucene) does use compression in the 
index file formats.  It would be possible to compress the bitmap for a 
filterCache entry, but that would slow things down when there is a cache 
hit.  I have no idea how much it would slow things down.


The cache warming probably isn't the problem.  That's only going to 
(temporarily) add 100 new entries to a new cache, then the old cache 
will be gone.  If the filterCache is indeed the major memory usage, it's 
probably queries that cause it to get large.


I'm curious whether the 101 million document count is for one shard 
replica or for the whole collection.  How many documents are in all the 
shard replicas handled by one Solr instance?


Thanks,
Shawn




Re: OutofMemory Error in solr 6.5

2021-08-11 Thread Satya Nand
Hi Shawn,

Please find the images.
*Filter cache stats:*
https://drive.google.com/file/d/19MHEzi9m3KS4s-M86BKFiwmnGkMh3DGx/view?usp=sharing
*Heap stats*
https://drive.google.com/file/d/1Q62ea-nFh9UjbcVcBJ39AECWym6nk2Yg/view?usp=sharing

I'm curious whether the 101 million document count is for one shard
> replica or for the whole collection.  How many documents are in all the
> shard replicas handled by one Solr instance?

We are not using solr cloud. We are using standalone solr with
Master-slave architecture. 101 million documents are in one core.

On Wed, Aug 11, 2021 at 5:20 PM Shawn Heisey  wrote:

> On 8/10/2021 11:17 PM, Satya Nand wrote:
> > Thanks for explaining it so well. We will work on reducing the filter
> > cache size and auto warm count.
> >
> > Though I have one question.
> >
> > If your configured 4000 entry filterCache were to actually fill up,
> it
> > would require nearly 51 billion bytes, and that's just for the one
> > core
> > with 101 million documents.  This is much larger than the 30GB
> > heap you
> > have specified ... I am betting that the filterCache is the reason
> > you're hitting OOME.
> >
> >
> > As you can see from the below screenshots the filter cache is almost
> > full and the heap is approx 18-20 GB. I think this means heap is not
> > actually taking 51 GB of space. Otherwise, the issue would have been
> > very frequent if the full cache had been taking ~50 GB of space. I
> > also believed the solr uses some compressed data structures to
> > accumulate its cache, That' how it is able to store the cache in less
> > memory. Isn't it?
> >
> > Also, the issue is not very frequent. It comes once or twice a month,
> > Where all follower servers stop working at the same time due to
> > OutOfMemory error.
>
> We can't see any of the images.  The mailing list software stripped
> them.  Most attachments do not come through -- you'll need to find a
> file sharing website and give us links.  Dropbox is a good choice, and
> there are others.
>
> The cache may not be getting full, but each entry is over 12 megabytes
> in size, so it will not need to be full to cause problems.  It does not
> get compressed.  Solr (actually Lucene) does use compression in the
> index file formats.  It would be possible to compress the bitmap for a
> filterCache entry, but that would slow things down when there is a cache
> hit.  I have no idea how much it would slow things down.
>
> The cache warming probably isn't the problem.  That's only going to
> (temporarily) add 100 new entries to a new cache, then the old cache
> will be gone.  If the filterCache is indeed the major memory usage, it's
> probably queries that cause it to get large.
>
> I'm curious whether the 101 million document count is for one shard
> replica or for the whole collection.  How many documents are in all the
> shard replicas handled by one Solr instance?
>
> Thanks,
> Shawn
>
>
>

-- 



Re: OutofMemory Error in solr 6.5

2021-08-11 Thread Shawn Heisey

On 8/11/2021 6:04 AM, Satya Nand wrote:

*Filter cache stats:*
https://drive.google.com/file/d/19MHEzi9m3KS4s-M86BKFiwmnGkMh3DGx/view?usp=sharing



This shows the current size as 3912, almost full.

There is an alternate format for filterCache entries, that just lists 
the IDs of the matching documents.  This only gets used when the 
hitcount for the filter is low.  I do not know what threshold it uses to 
decide that the hitcount is low enough to use the alternate format, and 
I do not know where in the code to look for the answer.  This is 
probably why you can have 3912 entries in the cache without blowing the 
heap.


I bet that when the heap gets blown, the filter queries Solr receives 
are such that they cannot use the alternate format, and thus require the 
full 12.7 million bytes.  Get enough of those, and you're going to need 
more heap than 30GB.  I bet that if you set the heap to 31G, the OOMEs 
would occur a little less frequently.  Note that if you set the heap to 
32G, you actually have less memory available than if you set it to 31G 
-- At 32GB, Java must switch from 32 bit pointers to 64 bit pointers.  
Solr creates a LOT of objects on the heap, so that difference adds up.


Discussion item for those with an interest in the low-level code:  What 
kind of performance impact would it cause to use a filter bitmap 
compressed with run-length encoding?  Would that happen at the Lucene 
level rather than the Solr level?


To fully solve this issue, you may need to re-engineer your queries so 
that fq values are highly reusable, and non-reusable filters are added 
to the main query.  Then you would not need a very large cache to obtain 
a good hit ratio.


Thanks,
Shawn



Re: Duplicate sample errors using prometheus-exporter in Solr 8.9.0

2021-08-11 Thread Joshua Hendrickson
Mathieu,

We have changed our Prometheus configuration to scrape only from one pod in the 
cluster, but we still see the error given below. Is there anything else we can 
try?

On 2021/08/11 08:58:34, Mathieu Marie  wrote: 
> It happens because you use *-z zk-url *to connect to solr.> 
> When you do that the prometheus-export assumes that it connects to a> 
> SolrCloud environment and will collect the metrics from all nodes.> 
> Given you have started 3 prometheus-exporters, each one of them will> 
> collect all metrics from the cluster.> 
> 
> You can fix this in two different ways:> 
> 1- use *-h * instead of *-z *> 
> 2- have only one instance of the prometheus-exporter in the cluster> 
> 
> Note that solution 1 will not retrieve the metrics you have configured in> 
> the ** tag in your configuration, as *-h* assumes a non-solr> 
> cloud instance.> 
> 
> Regards,> 
> Mathieu> 
> 
> On Wed, Aug 11, 2021 at 9:32 AM Joshua Hendrickson <> 
> jhendrick...@tripadvisor.com> wrote:> 
> 
> > Hello,> 
> >> 
> > Our organization has implemented Solr 8.9.0 for a production use case. We> 
> > have standardized on Prometheus for metrics collection and storage. We> 
> > export metrics from our Solr cluster by deploying the public Solr image 
> > for> 
> > version 8.9.0 to an EC2 instance and using Docker to run the exporter> 
> > binary against Solr (which is running in a container on the same host). 
> > Our> 
> > Prometheus scraper (hosted in Kubernetes and configured via a Helm chart)> 
> > reports errors like the following on every scrape:> 
> >> 
> > ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote> 
> > level=error remote_name=11d3d0 url=https://our.endpoint/push> 
> > msg="non-recoverable error" count=500 err="server returned HTTP status 400> 
> > Bad Request: user=n: err: duplicate sample for timestamp.> 
> > timestamp=2021-08-10T16:44:13.317Z,> 
> > series={__name__=\"solr_metrics_core_time_seconds_total\",> 
> > aws_account=\"our-account\", base_url=\"> 
> > http://fqdn.for.solr.server:32080/solr\";, category=\"QUERY\",> 
> > cluster=\"our-cluster\", collection=\"a-collection\",> 
> > core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\",> 
> > instance=\" fqdn.for.solr.server:8984\", job=\"solr\",> 
> > replica=\"replica_t13\", shard=\"shard1\"}"> 
> >> 
> > We have confirmed that there are indeed duplicate time series when we> 
> > query our promtheus exporter. Here is a sample that shows the duplicate> 
> > time series:> 
> >> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url=">
> >  
> > http://fqdn3.for.solr.server:32080/solr",} 1.533471301599E9> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url=">
> >  
> > http://fqdn3.for.solr.server:32080/solr",} 8.89078653472891E11> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url=">
> >  
> > http://fqdn3.for.solr.server:32080/solr",} 8.9061212477449E11> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url=">
> >  
> > http://fqdn2.for.solr.server:32080/solr",} 1.63796914645E9> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url=">
> >  
> > http://fqdn2.for.solr.server:32080/solr",} 9.05314998357273E11> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url=">
> >  
> > http://fqdn2.for.solr.server:32080/solr",} 9.06952967503723E11> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url=">
> >  
> > http://fqdn1.for.solr.server:32080/solr",} 1.667842814432E9> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url=">
> >  
> > http://fqdn1.for.solr.server:32080/solr",} 9.1289401347629E11> 
> >> 
> > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url=">
> >  
> > http://fqdn1.for.solr.server:32080/solr",} 9.14561856290722E11> 
> >> 
> > This is the systemd unit file that runs the exporter container:> 

Re: Time Routed Alias

2021-08-11 Thread David Smiley
I hope you have success with TRAs!

You can delete some number of collections from the rear of the chain, but
you must first update the TRA to exclude these collections.  This is
tested:
https://github.com/apache/solr/blob/f6c4f8a755603c3049e48eaf9511041252f2dbad/solr/core/src/test/org/apache/solr/update/processor/TimeRoutedAliasUpdateProcessorTest.java#L184
It'd be nice if it would remove itself from the alias.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 10, 2021 at 9:26 PM Matt Kuiper  wrote:

> I found some helpful information while testing TRAs:
>
> For our use-case I am hesitant to set up an autoDeleteAge (unless it can be
> modified - still need to test).  So I wondered about a little more manual
> delete management approach.
>
> I confirmed that I cannot simply delete a collection that is registered as
> part of a TRA.  The delete collection api call will fail with a message
> that the collection is a part of the alias.
>
> I did learn that I could use the same create TRA api call I used to create
> the TRA, but modify the router.start to date more recent than one or more
> of the older collections associated with the TRA. Then when I queried the
> TRA, I only received documents from the collections after the new
> router.start date. Also, I was now able to successfully delete the older
> collections with a standard collection delete command.
>
> I think this satisfies my initial use-case requirements to be able to
> modify an existing TRA and delete older collections.
>
> Matt
>
> On Mon, Aug 9, 2021 at 11:27 AM Matt Kuiper  wrote:
>
> > Hi Gus, Jan,
> >
> > I am considering implementing TRA for a large-scale Solr deployment.
> Your
> > Q&A is helpful!
> >
> > I am curious if you have experience/ideas regarding modifying the TR
> Alias
> > when one desires to manually delete old collections or modify the
> > router.autoDeleteAge to shorten or extend the delete age.  Here's a few
> > specific questions?
> >
> > 1) Can you manually delete an old collection (via collection api) and
> then
> > edit the start date (to a more recent date) of the TRA so that it no
> longer
> > sees/processes the deleted collection?
> > 2) Is the only way to manage the deletion of collections within a TRA
> > using the automatic deletion configuration? The router.autoDeleteAge
> > parameter.
> > 3) If you can only manage deletes using the router.autoDeleteAge
> > parameter, are you able to update this parameter to either:
> >
> >- Set the delete age earlier so that older collections are triggered
> >for automatic deletion sooner?
> >- Set the delete age to a larger value to extend the life of a
> >collection?  Say you originally  would like the collections to stay
> around
> >for 5 years, but then change your mind to 7 years.
> >
> > I will likely do some experimentation, but am interested to learn if you
> > have covered these use-cases with TRA.
> >
> > Thanks,
> > Matt
> >
> >
> > On Fri, Aug 6, 2021 at 8:08 AM Gus Heck  wrote:
> >
> >> Hi Jan,
> >>
> >> The key thing to remember about TRA's (or any Routed Alias) is that it
> >> only
> >> actively does two things:
> >> 1) Routes document updates to the correct collection by inspecting the
> >> routed field in the document
> >> 2) Detects when a new collection is required and creates it.
> >>
> >> If you don't send it data *nothing* happens. The collections are not
> >> created until data requires them (with an async create possible when it
> >> sees an update that has a timestamp "near" the next interval, see docs
> for
> >> router.preemptiveCreateMath )
> >>
> >> A) Dave's half of our talk at 2018 activate talks about it:
> >> https://youtu.be/RB1-7Y5NQeI?t=839
> >> B) Time Routed Aliases are a means by which to automate creation of
> >> collections and route documents to the created collections. Sizing, and
> >> performance of the individual collections is not otherwise special, and
> >> you
> >> can interact with the collections individually after they are created,
> >> with
> >> the obvious caveats that you probably don't want to be doing things that
> >> get them out of sync schema wise unless your client programs know how to
> >> handle documents of both types etc. A less obvious consequence of the
> >> routing is that your data must not ever republish the same document
> with a
> >> different route key (date for TRA), since that can lead to duplicate
> id's
> >> across collections. The "normal" use case is event data, things that
> >> happened and are done, and are correctly recorded (or at least their
> time
> >> is correctly recorded) the first time
> >> C) Configure the higher number of replicas, remove old ones manually if
> >> not
> >> needed. At query time it's "just an alias". Managing collections based
> on
> >> recency could be automated here, before autoscaling was deprecated I was
> >> thinking that adding a couple of hooks into autoscaling such that it
> could
> >> react

Re: OutofMemory Error in solr 6.5

2021-08-11 Thread Satya Nand
Thanks, Shawn,
This makes sense. Filter queries with high hit counts could be the trigger
for out-of-memory, That's why it is so infrequent.
We will try to relook filter queries and further try reducing filter cache
size.

one question though,

> There is an alternate format for filterCache entries, that just lists
> the IDs of the matching documents.  This only gets used when the
> hitcount for the filter is low.

Does this alternate format use different data structures to store the
document ids for filters with low document count, Other than the bitmap?

means the size constraint(filter cache size) would apply only on bitmap or
this alternate structure too or their sum?


On Wed, 11 Aug, 2021, 6:50 pm Shawn Heisey,  wrote:

> On 8/11/2021 6:04 AM, Satya Nand wrote:
> > *Filter cache stats:*
> >
> https://drive.google.com/file/d/19MHEzi9m3KS4s-M86BKFiwmnGkMh3DGx/view?usp=sharing
>
>
> This shows the current size as 3912, almost full.
>
> There is an alternate format for filterCache entries, that just lists
> the IDs of the matching documents.  This only gets used when the
> hitcount for the filter is low.  I do not know what threshold it uses to
> decide that the hitcount is low enough to use the alternate format, and
> I do not know where in the code to look for the answer.  This is
> probably why you can have 3912 entries in the cache without blowing the
> heap.
>
> I bet that when the heap gets blown, the filter queries Solr receives
> are such that they cannot use the alternate format, and thus require the
> full 12.7 million bytes.  Get enough of those, and you're going to need
> more heap than 30GB.  I bet that if you set the heap to 31G, the OOMEs
> would occur a little less frequently.  Note that if you set the heap to
> 32G, you actually have less memory available than if you set it to 31G
> -- At 32GB, Java must switch from 32 bit pointers to 64 bit pointers.
> Solr creates a LOT of objects on the heap, so that difference adds up.
>
> Discussion item for those with an interest in the low-level code:  What
> kind of performance impact would it cause to use a filter bitmap
> compressed with run-length encoding?  Would that happen at the Lucene
> level rather than the Solr level?
>
> To fully solve this issue, you may need to re-engineer your queries so
> that fq values are highly reusable, and non-reusable filters are added
> to the main query.  Then you would not need a very large cache to obtain
> a good hit ratio.
>
> Thanks,
> Shawn
>
>

--