Re: Increased memory usage after upgrading to Solr 9.6.0 - DocValuesIteratorCache

2024-10-02 Thread Michael Gibney
Thanks for raising this topic; I suspect it does warrant a Jira issue to address this, but I'll ask couple of questions first to make sure I'm not missing something: Do you have a very large number of dynamic fields configured as `useDocValuesAsStored=true`, and are you retrieving field values by

Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-07-09 Thread Michael Gibney
://github.com/apache/solr/blob/aec6e8f750037fea5f8d01dc49dabf28bf512d68/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L568-L569 > Just curious, has anyone ever tested the effectiveness of this thing? > Does it give at least one percent increase in performance? > > On Mon, Jul 8, 2024 at 10:

Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-07-08 Thread Michael Gibney
FYI: https://github.com/apache/solr/pull/2551 On Mon, Jul 8, 2024 at 9:55 AM Michael Gibney wrote: > > Thanks for reporting back. Found the issue at last, including the > magic number! Will post a fix for this shortly. > > https://github.com/a

Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-07-08 Thread Michael Gibney
toLoad != null && enableLazyFieldLoading ? new > LazyDocument(reader, docId) : null; > this.addLargeFieldsLazily = !largeFields.isEmpty(); > } > > > On Wed, Jun 26, 2024 at 5:10 AM Michael Gibney > wrote: > > > > FYI: > > https://issu

Re: GCP-Solr indexing performance is slow in 9.6.0 compare to 7.2.1

2024-07-02 Thread Michael Gibney
Hi Sathish: Did you find a resolution to this issue? What jdk version are you running in each case? Michael On Mon, Jun 17, 2024 at 2:39 PM Oleksandr Tkachuk wrote: > > Try to disable security manager. It can affect all requests, including > update requests. > I updated solr version from 8.4.0 to

Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-06-25 Thread Michael Gibney
/col/select?fl=fld2&wt=json&q=fld2:"v1"&start=0&rows=50' > Slowest: 0.0059 secs > Fastest: 0.0008 secs > Average: 0.0010 secs > Requests/sec: 4795.5539 > > ./hey_linux_amd64 -n 1 -c 5 -T "application/json" >

Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-06-21 Thread Michael Gibney
ng off enableLazyFieldLoading. I am > very surprised that this functionality continues to work when document > cache is disabled and I thought that this parameter was intended only > for it. In addition, we received an improvement in avg and 95% on many > other types of queries, as well as so

Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-06-20 Thread Michael Gibney
I've been unable to reproduce anything like this behavior. If you're really getting queryResultCache hits for these, then the field type/etc of the field you're querying on shouldn't make a difference. type/etc of the return field (product_id) would be more likely to matter. I wonder what would hap

Re: Solr 9.5.0 returns no history with v2 logging API

2024-02-22 Thread Michael Gibney
Ok, I figured out what's going on here. I don't mind creating a Jira issue for this (unless you'd prefer to); I'll have a PR up shortly. On Thu, Feb 22, 2024 at 11:41 AM Michael Gibney wrote: > > It looks like this is related to > https://issues.apache.org/j

Re: Solr 9.5.0 returns no history with v2 logging API

2024-02-22 Thread Michael Gibney
It looks like this is related to https://issues.apache.org/jira/browse/SOLR-17063 I'm investigating, but it looks like it would be appropriate to open a Jira issue for this. Thanks for reporting! Michael On Wed, Feb 21, 2024 at 4:52 PM Thomas Corthals wrote: > > Hi > > I've been using api/node

Re: Partial update slowness with a stored="false" dynamic field and lots of distinct field names

2024-02-21 Thread Michael Gibney
t 12:43 PM Michael Gibney wrote: > > It might be worth looking at this issue: > https://issues.apache.org/jira/browse/SOLR-16989 > > The irony is that this issue was supposed to help with slowness in > cases similar to what you describe. Can you send a full stack trace &g

Re: Partial update slowness with a stored="false" dynamic field and lots of distinct field names

2024-02-21 Thread Michael Gibney
It might be worth looking at this issue: https://issues.apache.org/jira/browse/SOLR-16989 The irony is that this issue was supposed to help with slowness in cases similar to what you describe. Can you send a full stack trace for a representative call to `DocValuesIteratorCache.newEntry(String)`?

Re: Performance and number of fields per document

2023-09-05 Thread Michael Gibney
> Note: for a test search that retrieves only 10 documents, qtime is very low > (2 msec) but the full request time to get javabin or json data is very slow > (several seconds). Reading between the lines here: does "full request" return a larger number of documents? How many? Are you attempting t

Re: Suggestions to improve Star queries latencies

2023-04-20 Thread Michael Gibney
> It is a query with popularity and recency boosts, requesting the first 100 > docs with 3 fields per doc. It sounds like you are scoring/sorting, so the optimization that Mikhail mentioned would not apply (your use-case is not "sort-irrelevant"). Can you share more about specifically how your imp

Re: Solr 8.11.2 / Oracle JDK 1.8.0_361

2023-03-01 Thread Michael Gibney
This may not be the issue for you, but I've seen this kind of error before when the jdk is swapped out on the filesystem under a running process on an old jdk. Make sure you restart the solr process to pick up the new jdk once it's in place. (The old process will continue to run with the deleted fi

Re: Filtering facets

2023-02-28 Thread Michael Gibney
This is one of the few remaining feature gaps (afaik) between legacy facets and JSON facets. There's.a relevant Jira issue (https://issues.apache.org/jira/browse/SOLR-14921) that summarizes the state of things pretty well, including what I think would be a workaround for your case (if a bit verbose

Re: multi-term synonym prevents single-term match -- known issue?

2023-02-10 Thread Michael Gibney
Rudi, I agree, this does not seem like how it should behave. Probably something that could be fixed in edismax, not something lower-level (Lucene)? Michael On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev wrote: > > Hello, Rudi. > Well, it doesn't seem perfect. Probably it's can be fixed > via

[ANNOUNCE] Apache Solr 9.1.1 released

2023-01-25 Thread Michael Gibney
The Solr PMC is pleased to announce the release of Apache Solr 9.1.1. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Solr project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration,

Re: Solrcloud strange CPU behaviour

2023-01-23 Thread Michael Gibney
Based on the behavior you describe and the version you're running, it might be worth taking a look at https://issues.apache.org/jira/browse/SOLR-13336 On Mon, Jan 23, 2023 at 9:39 AM Dominique Bejean wrote: > > Hi, > > On a SolrCloud 7.7 environment with 14 servers, we have one collection with >

Re: FastLRUCache/ConcurrentLRUCache computeIfAbsent is BLOCKED

2023-01-04 Thread Michael Gibney
It looks like you are running version >= 8.10? iirc, replacing legacy cache implementations with the up-to-date default impl (solr.CaffeineCache) has fixed similar problems in the past (though I can't at the moment find the thread/issue to reference). Michael On Wed, Jan 4, 2023 at 6:59 AM slly

Re: Heap Size Space and Span Queries

2022-12-19 Thread Michael Gibney
ly > >>>> (see documentation). If you need in order, an interval is required. > >>>> > >>>> Phrases are only in order for "slop=0". Compare to "slop=1" which means > >>>> "next to each other" and is no long

Re: Near Real Time not working as expected

2022-12-09 Thread Michael Gibney
> now it's at to 16 and I don't see that ( I went from 6 to 16), but the issue > still persists Just to clarify, the "overlapping ondeck searchers" went away at 16? Assuming that the issue that still persists is docs not being visible? It's tempting to interpret "Registered new searcher autowarm

Re: PDF version of the reference manual?

2022-11-10 Thread Michael Gibney
It's also worth noting that within-site online search of the refguide is vastly improved in the latest version, so one of the reasons to have preferred PDF in the past should be less of an issue now. Michael On Thu, Nov 10, 2022 at 5:29 AM Jan Høydahl wrote: > > We stopped shipping PDFs of RefGui

Re: [suspected SPAM] Re: Semantic Knowledge Graph theoric question

2022-06-28 Thread Michael Gibney
It's hard to give a concrete answer without knowing the actual counts involved, but iiuc significantTerms and relatedness are basically equivalent (happy to be corrected here if I'm wrong). > the relatedness function is iffy at best ? -- not sure what is meant by this. It's a function, and afaict

Re: Upgrade SOLR 7.3 to 8.9,json.facet query performance drops a lot

2022-05-31 Thread Michael Gibney
`json.facet` covers a lot of ground and can do a lot of different things under the hood. Would you be able to share more specific information about the kinds of `json.facet` request you're making, configuration of the fields in question, etc.? On Tue, May 31, 2022 at 11:54 AM slly wrote: > > Hell

Re: Solr OOM issues

2022-05-27 Thread Michael Gibney
Based on the version you're running I suggest you investigate whether possibly related to SOLR-13336 [1]. Particularly vulnerable configs would have query-time WordDelimiter[Graph]Filter configured to split and catenate, or Synonym[Graph]Filter with multi-term synonyms. [1] https://issues.apache.

Re: Reg Solr field cache

2022-05-26 Thread Michael Gibney
Taking a look at this, and going back to your initial question, it's unclear to me whether you're encountering a problem that you're trying to solve -- unless the problem is that you're _not_ hitting OOM? ;-) -- or are just asking out of general academic curiosity? If the former, could you be more

Re: Schema field type property - uninvertible

2022-05-20 Thread Michael Gibney
(echoing Shawn because I was about to hit send anyway): The process of "uninverting" a field involves running through the dictionary of indexed terms for a given field, and building an on-heap data structure that provides "doc => term" lookup (analogous to docValues), as opposed to "term => doc" l

Re: High CPU utilisation on Solr-8.11.0

2022-05-05 Thread Michael Gibney
> 3 Solr Nodes: 5 CPU, 42 GB Ram (Each) > 3 Zookeeper Nodes: 1 CPU, 2 GB Ram (Each) > 3 Shards: 42m Documents, 42 GB (Each) > Heap: 8 GB > > > There are no deleted documents in the cluster and no updates going on. We > are trying to match the performance first. > > >

Re: Possible issue with nesting and the pf parameter

2022-05-02 Thread Michael Gibney
I know I've noticed this as well -- that the `pf` parsing is naive with respect to more complex query syntax. I'm curious what others might have to say about this; if nobody else weighs in perhaps it might be a question for the dev@solr list. Regardless of the above, I'd advise against the kind of

Re: Nested Facets and SortableTextField

2022-04-27 Thread Michael Gibney
o its Sorting functions > I have read your previous comments (Mar, 2019) in > https://issues.apache.org/jira/browse/SOLR-13056 > Could your previous patch solve or partially solve the problem? > Kind regards, > Zhiqing > > On Tue, 26 Apr 2022 at 01:03, Michael Gibney > wr

Re: Nested Facets and SortableTextField

2022-04-25 Thread Michael Gibney
gt; "facet": { > "categories": { > "method":"uif", > "type": "terms", > "field": "name_txt_sort", > "limit": -1, > "facet": { > "sex

Re: Nested Facets and SortableTextField

2022-04-25 Thread Michael Gibney
This is related to https://issues.apache.org/jira/browse/SOLR-13056 I'm curious: if you set `method:uif` on the top-level facet, are you able to achieve the desired results? (Note that `method:uif` incurs the same heap memory overhead -- uninverting the indexed values -- as faceting over a regular

Re: Verifying the replica.type parameter behavior

2022-04-08 Thread Michael Gibney
`shards.preference` only affects the backend routing of requests to individual cores/shards. These backend requests should have an additional `distrib=false` param, and are the requests that are generally the most resource-intensive, in that they do the initial per-shard domain-narrowing. I'm fair

Re: solr relatedness weirdness on json facet function

2022-04-06 Thread Michael Gibney
umberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" > splitOnNumerics="0"/> > > > protected="protwords.txt" /> >

Re: solr relatedness weirdness on json facet function

2022-04-05 Thread Michael Gibney
Both `qf` and `relatedness` should be orthogonal to your question, iiuc. Understanding that your question is mainly about which terms are included (i.e. included at all -- nevermind ranking), then the only thing that should determine that is the field and fieldType config for the terms facet "field

Re: High CPU utilisation on Solr-8.11.0

2022-03-27 Thread Michael Gibney
I agree with Shawn about ideally wanting more memory for the OS. That said, the WordDelimiterFilter config you sent aligns with my suspicion that "graph phrase" issues are likely to explain the difference between 6.5 and 8.11. At query-time, WordDelimiterFilter (and also equally WordDelimiterGraph

Re: High CPU utilisation on Solr-8.11.0

2022-03-26 Thread Michael Gibney
Are you using query-time multi-term synonyms or WordDelimiter[Graph]Filter? -- these can trigger "graph phrase" queries, which are handled _quite_ differently in Solr 8.11 vs 6.5 (and although unlikely to directly cause the performance issues you're observing, might well explain the performance dis

Re: Representative filtering of very large result sets

2022-03-24 Thread Michael Gibney
Are you determining your "top doc" for each collapsed group based on score? If your use case is such that you determine the "top doc" based on a static field with a manageable number of values, you may have other options available to you. (For some use cases it can be acceptable to "pre-filter" the

Re: Solr: User Defined Caches

2022-02-22 Thread Michael Gibney
ating how > to implement the basics of the cache lifecycle and then the use of that > cache within a query. > > Thanks > > > > -Original Message- > From: Michael Gibney [mailto:mich...@michaelgibney.net] > Sent: Thursday, February 17, 2022 9:02 AM > To: u

Re: Solr: User Defined Caches

2022-02-17 Thread Michael Gibney
I just happened to be looking into this as well. I assume you've seen the refguide documentation: https://solr.apache.org/guide/8_11/query-settings-in-solrconfig.html#user-defined-caches Normally one would not configure a user-defined cache except in support of other custom plugins/components. e.g

Re: Stream Query result Cacheable ?

2021-12-06 Thread Michael Gibney
Could you share the specific requests/times that you're comparing? Oher potentially-relevant details like index size, field cardinalities, version, etc. might also be helpful. Michael On Mon, Dec 6, 2021 at 10:49 AM sambasivarao giddaluri < sambasiva.giddal...@gmail.com> wrote: > Any suggestions?

Re: Solr limit in words search - take 2

2021-11-17 Thread Michael Gibney
nother like the Word Delimiter > Filter, because the indexer can’t directly consume a graph. To get fully > correct positional queries when tokens are split, you should instead use > this filter at query time." > > > > -Original Message- > From: Michael Gibney

Re: Solr limit in words search - take 2

2021-11-17 Thread Michael Gibney
This is not the most thorough answer, but hopefully gets you headed in the right direction: Very strange things can happen when your index-time analysis chain generates "graph" token-streams (as yours does). A couple of things you could try: 1. experiment with setting `enableGraphQueries=false` on

Re: SOLR Performance on RHEL 7

2021-10-26 Thread Michael Gibney
In my experience, running Solr on CentOS 7 (comparable to RHEL 7) -- on VMWare, but "ballooning" was _not_ the issue -- I found that setting vm.swappiness=0 or 1 did not actually prevent swapping. Notwithstanding Shawn's excellent suggestions above, if you still suspect that swapping is the issue a

Re: Cpu 100%

2021-08-09 Thread Michael Gibney
Jeff, you mention indexing, but I'm curious: is this also a live system supporting queries at the same time? On Mon, Aug 9, 2021 at 8:53 AM Dominique Bejean wrote: > Hi Jeff, > > How many CPU ? > What is the CPU load average (information provided by Linux top command in > the first line) ? > > I

Re: sortabletextfield and docvalue

2021-07-23 Thread Michael Gibney
It does, but it's trappy. It facets on concatenated sort string values, but any refinement is done on tokenized values. See: https://issues.apache.org/jira/browse/SOLR-13056 https://issues.apache.org/jira/browse/SOLR-8362 I would not personally recommend faceting on SortableTextField (and separat

Re: Solr nodes crashing

2021-07-23 Thread Michael Gibney
.7% > 377.39 GB > 368.77 GB > > Swap Space 4.7% > 4.00 GB > 193.25 MB > > File Descriptor Count 0.2% > 128000 > 226 > > JVM-Memory 22.7% > 15.33 GB > 15.33 GB > > Thanks for looking, > Jon > > > -Original Message- > From: Sha

Re: Solr nodes crashing

2021-07-22 Thread Michael Gibney
ps- wrt requesting a "literal, complete search url" to aid troubleshooting: facets, `sort`, `offset`, and `rows` params would all be of particular interest. On Thu, Jul 22, 2021 at 12:25 PM Michael Gibney wrote: > SortableTextField uses docValues in a very specific way, and is no

Re: Solr nodes crashing

2021-07-22 Thread Michael Gibney
SortableTextField uses docValues in a very specific way, and is not a general-purpose workaround for enabling docValues on TextFields. Possibly of interest: https://issues.apache.org/jira/browse/SOLR-8362 That said, DocValues are relevant mainly (only?) wrt full-domain per-doc value-access (e.g.,

Re: Result set order when searching on "*" (asterisk character)

2021-07-22 Thread Michael Gibney
No sort option configured generally defaults to score (and currently does so even in cases such as the "*:*" case (MatchAllDocsQuery) where sort is guaranteed to be irrelevant; see: https://issues.apache.org/jira/browse/SOLR-14765). But functionally speaking that doesn't really matter: in the even

Re: Excessive query expansion when using WordDelimiterGraphFilter

2021-06-23 Thread Michael Gibney
a.gov.au/> > > The National Library of Australia (NLA) acknowledges Australia’s First > Nations Peoples – the First Australians – as the Traditional Owners and > Custodians of this land and gives respect to the Elders – past and present – > and through them to all Australian

Re: Excessive query expansion when using WordDelimiterGraphFilter

2021-06-21 Thread Michael Gibney
Hi Francis, I have indeed encountered this problem -- though as you've discovered it's dependent on specific types of analysis chain config. You should be able to avoid this behavior by constructing your analysis chains such that they don't in practice create graph TokenStreams. This advice gener

Re: Approaches to indexing indigenous languages?

2021-06-11 Thread Michael Gibney
+1 to ICU, and I'd also be interested in follow-up. In case transliteration might also be helpful for your case, I took a cursory glance at the out-of-the-box transliteration ids (https://github.com/unicode-org/icu/tree/main/icu4c/source/data/translit) and I don't think there's anything for the scr

Re: Unique function not working for Solr (Ver: 6.0.1) nested facets

2021-05-24 Thread Michael Gibney
I think your facet request syntax is wrong (you have duplicate "facet" keys for all but the "leaf" (poi) facet, which is why you see the "leaf"/poi facet working, but not the others). I wonder whether this should throw a 400 error? In any case could you see whether the following works as expected?:

Re: Issue with expand feature in collapse/expand result set workflow

2021-05-20 Thread Michael Gibney
Depending on what your goals/expectations are, it could also be worth noting the `expand.q` and `expand.fq` params, which are applied when fetching "expanded docs" (intersecting with the union of `expand.field` result values). In cases where you want to "pivot" to an unrestricted set of related clu

Re: Solr JVM Heap becomes full and stops when we try to restart

2021-05-10 Thread Michael Gibney
In addition to the other advice so far, if applicable I'd strongly recommend disabling swap (especially considering that you've already tried varying heap size without the desired effect): https://solr.apache.org/guide/8_8/taking-solr-to-production.html#disabling-swap Some good background reading o

Re: WordDelimiter does not generate expected token

2021-04-22 Thread Michael Gibney
WDGF with both "generate*Parts"/"splitOn" _and_ "catenate*"/"perserveOriginal" generates a graph TokenStream structure that relies on PositionLengthAttribute to accurately reflect the graph structure. Because Lucene does not index PositionLengthAttribute, this information is lost when WDGF is used

Re: Distributed IDF for Solr using ExactStatsCache issue

2021-03-24 Thread Michael Gibney
times. Most of the time, the scores do reflect a > distributed IDF, but sometimes scores that reflect the IDF of only one of > the shards (even though documents from both shards are returned). > > Thanks! > Cameron VandenBerg > > -Original Message- > From: Michael Gi

Re: Distributed IDF for Solr using ExactStatsCache issue

2021-03-22 Thread Michael Gibney
Cameron, What is your cluster configuration? i.e., how many nodes, how many replicas per node, how many replicas in each collection, etc.? Do you observe consistent behavior for the same query if you always route that query via the same "entry node" (i.e., not load balanced over the cluster)? Micha

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Michael Gibney
; Guess next step is to setup a small local test cluster and see what > happens. > > Jan Høydahl > > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney >: > > > > You say not "anything fancy" -- depending on how you define "fancy", if > you >

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Michael Gibney
You say not "anything fancy" -- depending on how you define "fancy", if you have an explicit `shards.preference` param, based on the version you're running (8.4) you might also take a look at https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the problem, removing the explicit `sha