Re: [dev help wanted] admin UI: make commandline args sorting optional

2024-02-06 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello,

Thank you very much for this question about workflow!

https://github.com/apache/solr/blob/main/CONTRIBUTING.md or 
https://solr.apache.org/community.html#how-to-contribute would be two ways to 
learn more.

And I'd like to also specifically mention that you can contribute to code 
without writing code e.g. by reviewing a proposed change or by trying it out 
locally.

Best wishes,
Christine

From: users@solr.apache.org At: 02/05/24 18:52:35 UTCTo:  users@solr.apache.org
Subject: Re: [dev help wanted] admin UI: make commandline args sorting optional

Hi,

I never contributed code to Solr before. Is there a general guideline to learn 
about the workflow? This might be a large question for this specific topic so 
don’t mind me much. 

Sincerely
ufuk yilmaz

—

> On 5 Feb 2024, at 21:21, Christine Poerschke (BLOOMBERG/ LONDON) 
 wrote:
> 
> Hi Everyone,
> 
> Are you curious about the code behind the Solr Admin UI, generally or the 
dashboard specifically? 
https://solr.apache.org/guide/solr/latest/getting-started/solr-admin-ui.html#das
hboard
> 
> And/Or have you ever found yourself staring at a Java commandline for quite a 
while?
> 
> If so then you may be interested in contributing to this issue: 
https://issues.apache.org/jira/browse/SOLR-16466
> 
> Thanks,
> Christine




Re: [dev help wanted] admin UI: make commandline args sorting optional

2024-02-06 Thread ufuk yılmaz
Thank you Christine!!

—

> On 6 Feb 2024, at 16:47, Christine Poerschke (BLOOMBERG/ LONDON) 
>  wrote:
> 
> Hello,
> 
> Thank you very much for this question about workflow!
> 
> https://github.com/apache/solr/blob/main/CONTRIBUTING.md or 
> https://solr.apache.org/community.html#how-to-contribute would be two ways to 
> learn more.
> 
> And I'd like to also specifically mention that you can contribute to code 
> without writing code e.g. by reviewing a proposed change or by trying it out 
> locally.
> 
> Best wishes,
> Christine
> 
> From: users@solr.apache.org At: 02/05/24 18:52:35 UTCTo:  
> users@solr.apache.org
> Subject: Re: [dev help wanted] admin UI: make commandline args sorting 
> optional
> 
> Hi,
> 
> I never contributed code to Solr before. Is there a general guideline to 
> learn 
> about the workflow? This might be a large question for this specific topic so 
> don’t mind me much. 
> 
> Sincerely
> ufuk yilmaz
> 
> —
> 
>> On 5 Feb 2024, at 21:21, Christine Poerschke (BLOOMBERG/ LONDON) 
>  wrote:
>> 
>> Hi Everyone,
>> 
>> Are you curious about the code behind the Solr Admin UI, generally or the 
> dashboard specifically? 
> https://solr.apache.org/guide/solr/latest/getting-started/solr-admin-ui.html#das
> hboard
>> 
>> And/Or have you ever found yourself staring at a Java commandline for quite 
>> a 
> while?
>> 
>> If so then you may be interested in contributing to this issue: 
> https://issues.apache.org/jira/browse/SOLR-16466
>> 
>> Thanks,
>> Christine
> 
> 



Re: Block MAX WAND feature use

2024-02-06 Thread rajani m
> With a 400M index it's worth experimenting with skipping about a million
of docs.
 Is there a param that allows setting how many docs to skip?

 "minExactCount '' which decides how many docs it should care to score and
I tested that with 100, 1000 and 2000 with latency only increased.

Alessandro,
Assuming it is approximately the total number of files under
/solr/replica_name/data/index  - is it 222. The top k files sizes

rw-r--r-- 1 solr solr  766M Feb  4 04:16 _chg1.cfs
-rw-r--r-- 1 solr solr 1020M Jan 29 18:37 _ca21.cfs
-rw-r--r-- 1 solr solr  3.7G Nov  5 23:49 _95vt.cfs
-rw-r--r-- 1 solr solr  3.8G Jan 15 08:59 _boyy.cfs
-rw-r--r-- 1 solr solr  3.8G Nov 29 16:01 _9ynt.cfs
-rw-r--r-- 1 solr solr  3.8G Jan 25 00:47 _c3t7.cfs
-rw-r--r-- 1 solr solr  4.1G Oct 26 14:37 _8pyh.cfs
-rw-r--r-- 1 solr solr  4.1G Oct 26 14:38 _7cwt.cfs
-rw-r--r-- 1 solr solr  4.3G Oct 27 06:04 _7s6c.cfs
-rw-r--r-- 1 solr solr  4.3G Oct 26 14:37 _7n8z.cfs
-rw-r--r-- 1 solr solr  4.5G Jan 18 00:30 _dteg.cfs
-rw-r--r-- 1 solr solr  4.5G Jan 19 17:44 _cwcc.cfs
-rw-r--r-- 1 solr solr  4.6G Jan 13 07:35 _blix.cfs
-rw-r--r-- 1 solr solr  4.9G Oct 26 14:39 _8gu9.cfs
-rw-r--r-- 1 solr solr  4.9G Oct 26 14:38 _3kj9.cfs



On Tue, Feb 6, 2024 at 2:45 AM Alessandro Benedetti 
wrote:

> It would be interesting to see the level pf fragmentation of each index
> indeed...
> I.e. How many segments per core, in a collection
>
> On Tue, 6 Feb 2024, 06:59 Mikhail Khludnev,  wrote:
>
> > 200-300 docs might be too few to get significant gain. With a 400M index
> > it's worth experimenting with skipping about a million of docs.
> > In simplified params I mean defType=lucene&df=description. debugQuery
> might
> > expose some details as well.
> > As far as I understand this feature works with large segments since it
> > skips a block of a segment, not a segment (?).
> >
> > On Mon, Feb 5, 2024 at 8:04 PM rajani m  wrote:
> >
> > > The "numFound" value is 200-300 docs difference when compared to the
> > query
> > > without "minExactFound" param.  The collection has over 400m records so
> > > testing the feature on a large collection.  The numFoundExact param in
> > the
> > > response is consistently false which tells me the feature is
> functioning
> > > but the results(qtime) are just off, not as expected.
> > >
> > > Would a type of query parser matter?I tested without the secondary
> sort,
> > > even without it there is no improvement in the query time latency and
> is
> > > still more than the query without this param.
> > >
> > >
> > >
> > > On Mon, Feb 5, 2024 at 10:34 AM Mikhail Khludnev 
> > wrote:
> > >
> > > > Hello,
> > > > How many matches do you have in both cases?
> > > > I see there's a second sorting expression, it might not comply with
> the
> > > > requirements.
> > > > I'd rather start from the simple single query parser, just for the
> > > > experiments.
> > > > Note: I never tried it myself.
> > > >
> > > > On Mon, Feb 5, 2024 at 6:20 PM rajani m 
> wrote:
> > > >
> > > > > I ran performance tests with different query sets and the results
> > look
> > > no
> > > > > good, it is adding to the latency around ~15% instead of reducing
> or
> > > even
> > > > > matching.  Not sure if I am missing something in the config or it
> is
> > an
> > > > > issue.
> > > > >
> > > > > Here is an example query *without* WAND query parameter
> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id
> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords
> > > description
> > > > > title
> > > > > vs
> > > > > *With* WAND query parameter
> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id
> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords
> > > description
> > > > > title*&minExactCount=10*
> > > > >
> > > > > On Thu, Feb 1, 2024 at 8:36 AM rajani m 
> > wrote:
> > > > >
> > > > > > Hi Ishan,
> > > > > >I have looked into that doc, and it looks like the solr
> version
> > > has
> > > > to
> > > > > > be >8.8 and the config needed is to add the query parameter
> > > > > "&minExactCount=k"
> > > > > > where k is 10 or 100 depending on the accuracy of the first k
> docs.
> > > > > >
> > > > > > I ran a query performance test using an internal tool, with k set
> > to
> > > 10
> > > > > > and 100, which barely showed any difference in query time
> latency,
> > I
> > > > > > didn't expect that so I was wondering if there is any
> > configuration I
> > > > > > missed.
> > > > > >
> > > > > > I will run a couple more tests with different query sets
> meanwhile
> > > and
> > > > > dig
> > > > > > further into implementation of the feature to see if I am missing
> > any
> > > > > > config here. Appreciate any suggestions.
> > > > > >
> > > > > > Thanks,
> > > > > > Rajani
> > > > > >
> > > > > > On Thu, Feb 1, 2024 at 12:53 AM Ishan Chattopadhyaya <
> > > > > > ichattopadhy...@gmail.com> wrote:
> > > > > >
> > > > > >> Is it possible to benchmark the query performance across a
> larger
> > > 

Does documentCache still make sense in modern Solr?

2024-02-06 Thread Chris Hostetter



TL;DR: Some limited testing suggests that documentCache adds overhead w/o 
benefit.  Are there any folks that can can report their usecases perform 
significantly better with documentCache enabled?



Background...

I was asked to investigate a Solr 9.1 kubernetes deployment where a subset 
of the Solr pods (hosting only PULL replicas) were configured to scale 
up/down as needed using a kubernetes HPA (based on CPU load).  The idea 
being that as traffic was largely cyclical over the course of the day, thy 
could save some compute cost by letting the system scale down pods when 
load was lower, and scale back up (still using the same Persistent 
Volumes) when load resumed the next day.


For the most part this works fine, but in some cases -- not all -- they 
were seeing that as traffic ramped up, and the HPA (re)started pod that 
had existing replicas on an existing Persistent Volume, a pod might 
experience several minutes of p99 query response times that were abismal: 
5+ seconds, compared to the typical 10-100ms.   Even the p50 response 
times would be in the 5+ seconds range for a several minutes!


When I started reproducing this behavior under controled load, what I 
found is that when this was happening, thread dumps showed a lot of 
threads blocked on trying to "read through" the documentCache via 
CaffeineCache.computeIfAbsent.



100s of threads:

"qtp392918519-19" #19 prio=5 os_prio=0 cpu=2853.50ms elapsed=129.15s 
tid=0x7f04316f6cf0 nid=0x5e waiting on condition  [0x7f03f8e2f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
- parking to wait for  <0x0004b1c6a6a8> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.7/Unknown 
Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.7/Unknown
 Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(java.base@17.0.7/Unknown
 Source)
at 
java.util.concurrent.locks.ReentrantLock$Sync.tryLockNanos(java.base@17.0.7/Unknown
 Source)
at 
java.util.concurrent.locks.ReentrantLock.tryLock(java.base@17.0.7/Unknown 
Source)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.lock(BoundedLocalCache.java:1510)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(BoundedLocalCache.java:1492)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2212)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:2182)
at 
com.github.benmanes.caffeine.cache.LocalAsyncCache$AsyncAsMapView.putIfAbsent(LocalAsyncCache.java:316)
at 
com.github.benmanes.caffeine.cache.LocalAsyncCache$AsyncAsMapView.putIfAbsent(LocalAsyncCache.java:291)
at 
org.apache.solr.search.CaffeineCache.computeAsync(CaffeineCache.java:209)
at 
org.apache.solr.search.CaffeineCache.computeIfAbsent(CaffeineCache.java:250)
at 
org.apache.solr.search.SolrDocumentFetcher.doc(SolrDocumentFetcher.java:259)

...seemingly blocked on one thread doing cleanup / eviction:

"qtp392918519-960" #960 prio=5 os_prio=0 cpu=141.89ms elapsed=32.24s 
tid=0x7f03f013ec30 nid=0x41f runnable  [0x7f01a946e000]
   java.lang.Thread.State: RUNNABLE
at 
com.github.benmanes.caffeine.cache.AccessOrderDeque.setPrevious(AccessOrderDeque.java:66)
at 
com.github.benmanes.caffeine.cache.AccessOrderDeque.setPrevious(AccessOrderDeque.java:30)
at 
com.github.benmanes.caffeine.cache.AbstractLinkedDeque.unlink(AbstractLinkedDeque.java:139)
at 
com.github.benmanes.caffeine.cache.AccessOrderDeque.remove(AccessOrderDeque.java:53)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.evictFromWindow(BoundedLocalCache.java:694)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.evictEntries(BoundedLocalCache.java:671)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.maintenance(BoundedLocalCache.java:1634)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(BoundedLocalCache.java:1602)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run(BoundedLocalCache.java:3620)
at 
org.apache.solr.search.CaffeineCache$$Lambda$656/0x0008015b2870.execute(Unknown
 Source)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers(BoundedLocalCache.java:1575)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleAfterWrite(BoundedLocalCache.java:1545)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(BoundedLocalCache.java:1477)
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.replace(BoundedLocalCache.java:2485)
at 
com.github.benmanes.caffeine.cache.LocalAsyncCache.lambda$handleCompletion$7

Group by query reports null pointer when unique key is not stored

2024-02-06 Thread rajani m
Hi Solr Users,

  Group by query is failing with the following error message. It looks like
TopGroupsResultTransformer.java

retrieveDocument method is fetching doc based on id is calling a stored
field visitor method. I tried setting the "id" field to use doc values as
they are enabled "useDocValuesAsStored:true" but this didn't help. Any
alternative?

Would you call this a bug or improvement?

Error message -

java.lang.NullPointerException: Cannot invoke
"org.apache.lucene.index.IndexableField.stringValue()" because "f" is
null => java.lang.NullPointerException: Cannot invoke
"org.apache.lucene.index.IndexableField.stringValue()" because "f" is
null
at org.apache.solr.schema.FieldType.toExternal(FieldType.java:378)
java.lang.NullPointerException: Cannot invoke
"org.apache.lucene.index.IndexableField.stringValue()" because "f" is
null
at org.apache.solr.schema.FieldType.toExternal(FieldType.java:378) 
~[?:?]
at 
org.apache.solr.search.grouping.distributed.shardresultserializer.TopGroupsResultTransformer.serializeTopGroups(TopGroupsResultTransformer.java:238)
~[?:?]


Re: Group by query reports null pointer when unique key is not stored

2024-02-06 Thread uyil...@vivaldi.net.INVALID
I also got this exception before and in order to avoid reindexing TB's of data, 
had to resort to grouping via streaming expressions, which has its ups and 
downs. If it's technically infeasible to substitute docValues for this purpose 
(when useDocValuesAsStored:true), it would be nice if it was documented in 
"grouping" feature or advised in general schema design pages to make unique 
ID's always stored:true.

-ufuk yilmaz

From: rajani m 
Sent: Wednesday, February 7, 2024 3:18 AM
To: solr-user 
Subject: Group by query reports null pointer when unique key is not stored

Hi Solr Users,

  Group by query is failing with the following error message. It looks like
TopGroupsResultTransformer.java

retrieveDocument method is fetching doc based on id is calling a stored
field visitor method. I tried setting the "id" field to use doc values as
they are enabled "useDocValuesAsStored:true" but this didn't help. Any
alternative?

Would you call this a bug or improvement?

Error message -

java.lang.NullPointerException: Cannot invoke
"org.apache.lucene.index.IndexableField.stringValue()" because "f" is
null => java.lang.NullPointerException: Cannot invoke
"org.apache.lucene.index.IndexableField.stringValue()" because "f" is
null
at org.apache.solr.schema.FieldType.toExternal(FieldType.java:378)
java.lang.NullPointerException: Cannot invoke
"org.apache.lucene.index.IndexableField.stringValue()" because "f" is
null
at org.apache.solr.schema.FieldType.toExternal(FieldType.java:378) 
~[?:?]
at 
org.apache.solr.search.grouping.distributed.shardresultserializer.TopGroupsResultTransformer.serializeTopGroups(TopGroupsResultTransformer.java:238)
~[?:?]