On 5/2/23 15:30, Bill Tantzen wrote:
This works as I expected:
ab00c.tif -- tokenizes as it should with a value of ab00c.tif
This doesn't work as I expected
ab003.tif -- tokenizes with a result of ab003 and tif
I got the same behavior with ICUTokenizer, which uses ICU4J for Unicode
handling.
Shawn,
No, email addresses are not preserved -- from the docs:
-
The "@" character is among the set of token-splitting punctuation, so
email addresses are not preserved as single tokens.
but the non-split on "test.com" vs the split on "test7.com" is unexpected!
~~Bill
On Wed, May 3,
Hi all,
Just asking if there could be some correlation from the amount of memory
allocated by a Solr query and the number of *hits* selected in solr logs.
I haven't found anything in the Solr documentation.
Do you know if there is some advice for the hits value?
Thanks,
Vincenzo
--
Vincenzo D'
Hello Vincenzo,
Yes. Last time i checked, an array of ScoreDoc objects is created for each
query with the size of the numFound for the local core/replica. This should
clearly visible in VisualVM. This happens in SolrIndexSearcher.
Regards,
Markus
Op wo 3 mei 2023 om 17:20 schreef Vincenzo D'Amor
Hi Markus,
thanks for your explanation.
What if I submit a query q=*:*&rows=0 and there are 200M of documents in
the solr core? Will I allocate an array of ScoreDoc objects so big?
On Wed, May 3, 2023 at 5:32 PM Markus Jelsma
wrote:
> Hello Vincenzo,
>
> Yes. Last time i checked, an array of
Here is an example calculation of bytes -> number of entries held from the
bitset.
(2864256-12-12)/24 = 119343 long objects = 22913856 entries
The above is from a cluster where each query is generating a bitset of size
2864256 bytes - ~2.8 MB on heap. This is for 22 million results in the
results
1) timeAllowed does limit spellcheck (at least in all the code paths i can
think of that may be "slow") ... have you tried it?
2) what is your configuration for the dictionaries you are using?
3) be wary of https://github.com/apache/lucene/issues/12077
: Date: Tue, 2 May 2023 00:04:27 +0530