Thanks for raising this topic; I suspect it does warrant a Jira issue to
address this, but I'll ask couple of questions first to make sure I'm not
missing something:
Do you have a very large number of dynamic fields configured as
`useDocValuesAsStored=true`, and are you retrieving field values by
://github.com/apache/solr/blob/aec6e8f750037fea5f8d01dc49dabf28bf512d68/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L568-L569
> Just curious, has anyone ever tested the effectiveness of this thing?
> Does it give at least one percent increase in performance?
>
> On Mon, Jul 8, 2024 at 10:
FYI: https://github.com/apache/solr/pull/2551
On Mon, Jul 8, 2024 at 9:55 AM Michael Gibney wrote:
>
> Thanks for reporting back. Found the issue at last, including the
> magic number! Will post a fix for this shortly.
>
> https://github.com/a
toLoad != null && enableLazyFieldLoading ? new
> LazyDocument(reader, docId) : null;
> this.addLargeFieldsLazily = !largeFields.isEmpty();
> }
>
>
> On Wed, Jun 26, 2024 at 5:10 AM Michael Gibney
> wrote:
> >
> > FYI:
> > https://issu
Hi Sathish:
Did you find a resolution to this issue? What jdk version are you
running in each case?
Michael
On Mon, Jun 17, 2024 at 2:39 PM Oleksandr Tkachuk wrote:
>
> Try to disable security manager. It can affect all requests, including
> update requests.
> I updated solr version from 8.4.0 to
/col/select?fl=fld2&wt=json&q=fld2:"v1"&start=0&rows=50'
> Slowest: 0.0059 secs
> Fastest: 0.0008 secs
> Average: 0.0010 secs
> Requests/sec: 4795.5539
>
> ./hey_linux_amd64 -n 1 -c 5 -T "application/json"
>
ng off enableLazyFieldLoading. I am
> very surprised that this functionality continues to work when document
> cache is disabled and I thought that this parameter was intended only
> for it. In addition, we received an improvement in avg and 95% on many
> other types of queries, as well as so
I've been unable to reproduce anything like this behavior. If you're
really getting queryResultCache hits for these, then the field
type/etc of the field you're querying on shouldn't make a difference.
type/etc of the return field (product_id) would be more likely to
matter. I wonder what would hap
Ok, I figured out what's going on here. I don't mind creating a Jira
issue for this (unless you'd prefer to); I'll have a PR up shortly.
On Thu, Feb 22, 2024 at 11:41 AM Michael Gibney
wrote:
>
> It looks like this is related to
> https://issues.apache.org/j
It looks like this is related to
https://issues.apache.org/jira/browse/SOLR-17063
I'm investigating, but it looks like it would be appropriate to open a
Jira issue for this.
Thanks for reporting!
Michael
On Wed, Feb 21, 2024 at 4:52 PM Thomas Corthals wrote:
>
> Hi
>
> I've been using api/node
t 12:43 PM Michael Gibney
wrote:
>
> It might be worth looking at this issue:
> https://issues.apache.org/jira/browse/SOLR-16989
>
> The irony is that this issue was supposed to help with slowness in
> cases similar to what you describe. Can you send a full stack trace
&g
It might be worth looking at this issue:
https://issues.apache.org/jira/browse/SOLR-16989
The irony is that this issue was supposed to help with slowness in
cases similar to what you describe. Can you send a full stack trace
for a representative call to
`DocValuesIteratorCache.newEntry(String)`?
> Note: for a test search that retrieves only 10 documents, qtime is very low
> (2 msec) but the full request time to get javabin or json data is very slow
> (several seconds).
Reading between the lines here: does "full request" return a larger
number of documents? How many? Are you attempting t
> It is a query with popularity and recency boosts, requesting the first 100
> docs with 3 fields per doc.
It sounds like you are scoring/sorting, so the optimization that
Mikhail mentioned would not apply (your use-case is not
"sort-irrelevant"). Can you share more about specifically how your
imp
This may not be the issue for you, but I've seen this kind of error
before when the jdk is swapped out on the filesystem under a running
process on an old jdk. Make sure you restart the solr process to pick
up the new jdk once it's in place. (The old process will continue to
run with the deleted fi
This is one of the few remaining feature gaps (afaik) between legacy
facets and JSON facets. There's.a relevant Jira issue
(https://issues.apache.org/jira/browse/SOLR-14921) that summarizes the
state of things pretty well, including what I think would be a
workaround for your case (if a bit verbose
Rudi,
I agree, this does not seem like how it should behave. Probably
something that could be fixed in edismax, not something lower-level
(Lucene)?
Michael
On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev wrote:
>
> Hello, Rudi.
> Well, it doesn't seem perfect. Probably it's can be fixed
> via
The Solr PMC is pleased to announce the release of Apache Solr 9.1.1.
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Solr project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration,
Based on the behavior you describe and the version you're running, it
might be worth taking a look at
https://issues.apache.org/jira/browse/SOLR-13336
On Mon, Jan 23, 2023 at 9:39 AM Dominique Bejean
wrote:
>
> Hi,
>
> On a SolrCloud 7.7 environment with 14 servers, we have one collection with
>
It looks like you are running version >= 8.10? iirc, replacing legacy
cache implementations with the up-to-date default impl
(solr.CaffeineCache) has fixed similar problems in the past (though I
can't at the moment find the thread/issue to reference).
Michael
On Wed, Jan 4, 2023 at 6:59 AM slly
ly
> >>>> (see documentation). If you need in order, an interval is required.
> >>>>
> >>>> Phrases are only in order for "slop=0". Compare to "slop=1" which means
> >>>> "next to each other" and is no long
> now it's at to 16 and I don't see that ( I went from 6 to 16), but the issue
> still persists
Just to clarify, the "overlapping ondeck searchers" went away at 16?
Assuming that the issue that still persists is docs not being visible?
It's tempting to interpret "Registered new searcher autowarm
It's also worth noting that within-site online search of the refguide
is vastly improved in the latest version, so one of the reasons to
have preferred PDF in the past should be less of an issue now.
Michael
On Thu, Nov 10, 2022 at 5:29 AM Jan Høydahl wrote:
>
> We stopped shipping PDFs of RefGui
It's hard to give a concrete answer without knowing the actual counts
involved, but iiuc significantTerms and relatedness are basically
equivalent (happy to be corrected here if I'm wrong).
> the relatedness function is iffy at best
? -- not sure what is meant by this. It's a function, and afaict
`json.facet` covers a lot of ground and can do a lot of different
things under the hood. Would you be able to share more specific
information about the kinds of `json.facet` request you're making,
configuration of the fields in question, etc.?
On Tue, May 31, 2022 at 11:54 AM slly wrote:
>
> Hell
Based on the version you're running I suggest you investigate whether
possibly related to SOLR-13336 [1].
Particularly vulnerable configs would have query-time
WordDelimiter[Graph]Filter configured to split and catenate, or
Synonym[Graph]Filter with multi-term synonyms.
[1] https://issues.apache.
Taking a look at this, and going back to your initial question, it's
unclear to me whether you're encountering a problem that you're trying
to solve -- unless the problem is that you're _not_ hitting OOM? ;-)
-- or are just asking out of general academic curiosity? If the
former, could you be more
(echoing Shawn because I was about to hit send anyway):
The process of "uninverting" a field involves running through the
dictionary of indexed terms for a given field, and building an on-heap data
structure that provides "doc => term" lookup (analogous to docValues), as
opposed to "term => doc" l
> 3 Solr Nodes: 5 CPU, 42 GB Ram (Each)
> 3 Zookeeper Nodes: 1 CPU, 2 GB Ram (Each)
> 3 Shards: 42m Documents, 42 GB (Each)
> Heap: 8 GB
>
>
> There are no deleted documents in the cluster and no updates going on. We
> are trying to match the performance first.
>
>
>
I know I've noticed this as well -- that the `pf` parsing is naive with
respect to more complex query syntax. I'm curious what others might have to
say about this; if nobody else weighs in perhaps it might be a question for
the dev@solr list.
Regardless of the above, I'd advise against the kind of
o its Sorting functions
> I have read your previous comments (Mar, 2019) in
> https://issues.apache.org/jira/browse/SOLR-13056
> Could your previous patch solve or partially solve the problem?
> Kind regards,
> Zhiqing
>
> On Tue, 26 Apr 2022 at 01:03, Michael Gibney
> wr
gt; "facet": {
> "categories": {
> "method":"uif",
> "type": "terms",
> "field": "name_txt_sort",
> "limit": -1,
> "facet": {
> "sex
This is related to https://issues.apache.org/jira/browse/SOLR-13056
I'm curious: if you set `method:uif` on the top-level facet, are you able
to achieve the desired results? (Note that `method:uif` incurs the same
heap memory overhead -- uninverting the indexed values -- as faceting over
a regular
`shards.preference` only affects the backend routing of requests to
individual cores/shards. These backend requests should have an additional
`distrib=false` param, and are the requests that are generally the most
resource-intensive, in that they do the initial per-shard domain-narrowing.
I'm fair
umberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> splitOnNumerics="0"/>
>
>
> protected="protwords.txt" />
>
Both `qf` and `relatedness` should be orthogonal to your question, iiuc.
Understanding that your question is mainly about which terms are included
(i.e. included at all -- nevermind ranking), then the only thing that
should determine that is the field and fieldType config for the terms facet
"field
I agree with Shawn about ideally wanting more memory for the OS.
That said, the WordDelimiterFilter config you sent aligns with my suspicion
that "graph phrase" issues are likely to explain the difference between 6.5
and 8.11. At query-time, WordDelimiterFilter (and also equally
WordDelimiterGraph
Are you using query-time multi-term synonyms or WordDelimiter[Graph]Filter?
-- these can trigger "graph phrase" queries, which are handled _quite_
differently in Solr 8.11 vs 6.5 (and although unlikely to directly cause
the performance issues you're observing, might well explain the performance
dis
Are you determining your "top doc" for each collapsed group based on score?
If your use case is such that you determine the "top doc" based on a static
field with a manageable number of values, you may have other options
available to you. (For some use cases it can be acceptable to "pre-filter"
the
ating how
> to implement the basics of the cache lifecycle and then the use of that
> cache within a query.
>
> Thanks
>
>
>
> -Original Message-
> From: Michael Gibney [mailto:mich...@michaelgibney.net]
> Sent: Thursday, February 17, 2022 9:02 AM
> To: u
I just happened to be looking into this as well. I assume you've seen the
refguide documentation:
https://solr.apache.org/guide/8_11/query-settings-in-solrconfig.html#user-defined-caches
Normally one would not configure a user-defined cache except in support of
other custom plugins/components. e.g
Could you share the specific requests/times that you're comparing? Oher
potentially-relevant details like index size, field cardinalities, version,
etc. might also be helpful.
Michael
On Mon, Dec 6, 2021 at 10:49 AM sambasivarao giddaluri <
sambasiva.giddal...@gmail.com> wrote:
> Any suggestions?
nother like the Word Delimiter
> Filter, because the indexer can’t directly consume a graph. To get fully
> correct positional queries when tokens are split, you should instead use
> this filter at query time."
>
>
>
> -Original Message-
> From: Michael Gibney
This is not the most thorough answer, but hopefully gets you headed in the
right direction:
Very strange things can happen when your index-time analysis chain
generates "graph" token-streams (as yours does). A couple of things you
could try:
1. experiment with setting `enableGraphQueries=false` on
In my experience, running Solr on CentOS 7 (comparable to RHEL 7) -- on
VMWare, but "ballooning" was _not_ the issue -- I found that setting
vm.swappiness=0 or 1 did not actually prevent swapping. Notwithstanding
Shawn's excellent suggestions above, if you still suspect that swapping is
the issue a
Jeff, you mention indexing, but I'm curious: is this also a live system
supporting queries at the same time?
On Mon, Aug 9, 2021 at 8:53 AM Dominique Bejean
wrote:
> Hi Jeff,
>
> How many CPU ?
> What is the CPU load average (information provided by Linux top command in
> the first line) ?
>
> I
It does, but it's trappy. It facets on concatenated sort string values, but
any refinement is done on tokenized values. See:
https://issues.apache.org/jira/browse/SOLR-13056
https://issues.apache.org/jira/browse/SOLR-8362
I would not personally recommend faceting on SortableTextField (and
separat
.7%
> 377.39 GB
> 368.77 GB
>
> Swap Space 4.7%
> 4.00 GB
> 193.25 MB
>
> File Descriptor Count 0.2%
> 128000
> 226
>
> JVM-Memory 22.7%
> 15.33 GB
> 15.33 GB
>
> Thanks for looking,
> Jon
>
>
> -Original Message-
> From: Sha
ps- wrt requesting a "literal, complete search url" to aid troubleshooting:
facets, `sort`, `offset`, and `rows` params would all be of particular
interest.
On Thu, Jul 22, 2021 at 12:25 PM Michael Gibney
wrote:
> SortableTextField uses docValues in a very specific way, and is no
SortableTextField uses docValues in a very specific way, and is not a
general-purpose workaround for enabling docValues on TextFields. Possibly
of interest: https://issues.apache.org/jira/browse/SOLR-8362
That said, DocValues are relevant mainly (only?) wrt full-domain per-doc
value-access (e.g.,
No sort option configured generally defaults to score (and currently does
so even in cases such as the "*:*" case (MatchAllDocsQuery) where sort is
guaranteed to be irrelevant; see:
https://issues.apache.org/jira/browse/SOLR-14765).
But functionally speaking that doesn't really matter: in the even
a.gov.au/>
>
> The National Library of Australia (NLA) acknowledges Australia’s First
> Nations Peoples – the First Australians – as the Traditional Owners and
> Custodians of this land and gives respect to the Elders – past and present –
> and through them to all Australian
Hi Francis,
I have indeed encountered this problem -- though as you've discovered
it's dependent on specific types of analysis chain config.
You should be able to avoid this behavior by constructing your
analysis chains such that they don't in practice create graph
TokenStreams. This advice gener
+1 to ICU, and I'd also be interested in follow-up. In case
transliteration might also be helpful for your case, I took a cursory
glance at the out-of-the-box transliteration ids
(https://github.com/unicode-org/icu/tree/main/icu4c/source/data/translit)
and I don't think there's anything for the scr
I think your facet request syntax is wrong (you have duplicate "facet"
keys for all but the "leaf" (poi) facet, which is why you see the
"leaf"/poi facet working, but not the others). I wonder whether this
should throw a 400 error? In any case could you see whether the
following works as expected?:
Depending on what your goals/expectations are, it could also be worth
noting the `expand.q` and `expand.fq` params, which are applied when
fetching "expanded docs" (intersecting with the union of
`expand.field` result values). In cases where you want to "pivot" to
an unrestricted set of related clu
In addition to the other advice so far, if applicable I'd strongly
recommend disabling swap (especially considering that you've already
tried varying heap size without the desired effect):
https://solr.apache.org/guide/8_8/taking-solr-to-production.html#disabling-swap
Some good background reading o
WDGF with both "generate*Parts"/"splitOn" _and_
"catenate*"/"perserveOriginal" generates a graph TokenStream structure that
relies on PositionLengthAttribute to accurately reflect the graph
structure. Because Lucene does not index PositionLengthAttribute, this
information is lost when WDGF is used
times. Most of the time, the scores do reflect a
> distributed IDF, but sometimes scores that reflect the IDF of only one of
> the shards (even though documents from both shards are returned).
>
> Thanks!
> Cameron VandenBerg
>
> -Original Message-
> From: Michael Gi
Cameron,
What is your cluster configuration? i.e., how many nodes, how many replicas
per node, how many replicas in each collection, etc.? Do you observe
consistent behavior for the same query if you always route that query via
the same "entry node" (i.e., not load balanced over the cluster)?
Micha
; Guess next step is to setup a small local test cluster and see what
> happens.
>
> Jan Høydahl
>
> > 10. mar. 2021 kl. 15:46 skrev Michael Gibney >:
> >
> > You say not "anything fancy" -- depending on how you define "fancy", if
> you
>
You say not "anything fancy" -- depending on how you define "fancy", if you
have an explicit `shards.preference` param, based on the version you're
running (8.4) you might also take a look at
https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
problem, removing the explicit `sha
62 matches
Mail list logo