Re: Advice on ways forward with or without Data Import Handler

2025-05-29 Thread Walter Underwood
failure. Back with Solr 1.3, before DIH, I wrote a Java program to fetch from the database, then load. That did some transformation, mostly making queue adds comparable with views (this was at Netflix). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Solr error...

2025-04-06 Thread Walter Underwood
Multi-threaded indexing can speed things up. Use two threads per CPU to get maximum throughput. I wrote a simple Python program to do that. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 6, 2025, at 5:11 PM, Robi Petersen wrote: > >

Re: More information about copyField?

2025-02-18 Thread Walter Underwood
change from IUPUI? I went to North Central High School. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 18, 2025, at 9:19 AM, mw...@iu.edu wrote: > > So *how* does copyField work? Do I wind up with two identical copies > of the data s

Re: More information about copyField?

2025-02-18 Thread Walter Underwood
change from IUPUI? I went to North Central High School. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 18, 2025, at 9:19 AM, mw...@iu.edu wrote: > > So *how* does copyField work? Do I wind up with two identical copies > of the data s

Re: More information about copyField?

2025-02-18 Thread Walter Underwood
s field. The only attributes documented > there are source, dest, and maxChars. copyField is not a field. It is an instruction to duplicate the text heading to one field and also send it to another field. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: More information about copyField?

2025-02-18 Thread Walter Underwood
s field. The only attributes documented > there are source, dest, and maxChars. copyField is not a field. It is an instruction to duplicate the text heading to one field and also send it to another field. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: SOLR Sorting Order Issue

2025-01-15 Thread Walter Underwood
. Something like: score desc, id desc wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 15, 2025, at 4:30 AM, Binal Panchal wrote: > > Hello Team, > > I have two Solr indexes having the same documents. If I do sorting, the &g

Re: Multiply connected data search

2025-01-04 Thread Walter Underwood
. Your book data will not change that frequently. I ran search for Netflix, which is not that different from searching books. I also ran search for Chegg, searching textbooks. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 4, 2025, at 1:46 PM, Nik

Re: Multiply connected data search

2025-01-04 Thread Walter Underwood
. Your book data will not change that frequently. I ran search for Netflix, which is not that different from searching books. I also ran search for Chegg, searching textbooks. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 4, 2025, at 1:46 PM, Nik

Re: Multiply connected data search

2024-12-24 Thread Walter Underwood
authors title^8 authors^2 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 23, 2024, at 10:07 PM, Nikola Smolenski wrote: > > Thank you for the suggestion, but that wouldn't work because there could be > multiple authors with t

Re: Solr in write-only mode?

2024-11-26 Thread Walter Underwood
26, 2024, at 8:06 AM, Walter Underwood wrote: > > Use multiple threads to send batches. I use two moderate sized batches and > two threads per CPU. You can tune it until you see near 100% CPU utilization. > > Why two client threads per CPU? Roughly, one batch being processed by

Re: Solr in write-only mode?

2024-11-26 Thread Walter Underwood
processed. Indexing is CPU-intensive, so once it approaches 100% utilization, it is maxed out. Add more CPUs to go faster. I doubt that messing with commits will make a meaningful difference. Use auto commit so the indexing threads aren’t waiting. wunder Walter Underwood wun...@wunderwood.org

Re: Solr 6 - org.apache.lucene.index.CorruptIndexException

2024-06-18 Thread Walter Underwood
a new user registered). I actually can’t remember any index corruption in Solr and I’ve run versions from 1.3 to 9.1 with both high query load (Netflix) and massive content (LexisNexis). I would look at system-level causes, not Solr. wunder Walter Underwood wun...@wunderwood.org http

Re: Ignore unknown fields when indexing PDFs

2024-06-04 Thread Walter Underwood
. I dealt with PDF documents in search for over twenty years. You are lucky to get searchable text out of them. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 4, 2024, at 8:28 AM, Uwe Amberger wrote: > > Hallo! > > Problem descri

Re: solr query sanitizer?

2024-05-29 Thread Walter Underwood
Honestly, there is a missing feature here. Solr should have a free text query parser. Run the query through standard tokenizer, ignore all the syntax, and make a bunch of word/phrase queries. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May

Re: solr query sanitizer?

2024-05-29 Thread Walter Underwood
word. A more conservative approach is to remove “*” and “?”, so you prevent script kiddie queries like “a* b* c* d* e* f* …” wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 29, 2024, at 7:11 AM, Dmitri Maziuk wrote: > > Hi all, > &

Re: SolrCloud behavior when Zookeeper has lost a quorum.

2024-05-21 Thread Walter Underwood
, like a replica going down. Pretty easy to test, shut down all the Zookeeper nodes in the middle of a load test. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 21, 2024, at 8:30 AM, Matt Kuiper wrote: > > Thanks for the responses! &

Re: Block resource intensive queries at solr to avoid production outages

2024-04-23 Thread Walter Underwood
You can send the timeAllowed parameter. It is only checked at certain points in request processing, but it will stop requests that run too long. https://solr.apache.org/guide/solr/latest/query-guide/common-query-parameters.html#timeallowed-parameter wunder Walter Underwood wun

Re: Max value for maxBooleanClauses?

2024-04-19 Thread Walter Underwood
/search/IndexSearcher.java at 3024e66e4aba942b039fcad7daf958aa4c90b8bf · apache/lucene github.com wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 17, 2024, at 3:29 PM, Walter Underwood wrote: > > I know about both of those user-specifi

Re: Max value for maxBooleanClauses?

2024-04-17 Thread Walter Underwood
I know about both of those user-specified limits. They are documented, as is the change in counting clauses in 9.0. I’ll ask again, is there a hard upper limit on the value of maxBooleanClauses? wunder > On Apr 17, 2024, at 2:33 PM, Chris Hostetter wrote: > > > > : Is there a hard upper lim

Max value for maxBooleanClauses?

2024-04-17 Thread Walter Underwood
Is there a hard upper limit for maxBooleanClauses? We have someone hitting a limit at 64k clauses after upgrading to 9.x. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Facing Error : "Task queue processing has stalled for 20121 ms with 0 remaining elements to process"

2024-04-10 Thread Walter Underwood
are stored in the database. Having Solr generate the IDs makes it impossible to update the documents. I’ve used Solr in production for over 15 years and I’ve never had Solr generate the IDs. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 8, 2

Re: Date syntax : Filter by month independent of the year

2024-04-05 Thread Walter Underwood
This is a great example of a general technique to make Solr fast. Do the parsing and selection at index time to make the query as simple as possible. —wunder > On Apr 5, 2024, at 10:55 AM, rajani m wrote: > > yeah, makes sense, thank you. > > On Fri, Apr 5, 2024 at 1:24 PM W

Re: Date syntax : Filter by month independent of the year

2024-04-05 Thread Walter Underwood
That is what I was going to suggest. Make a month field. —wunder > On Apr 5, 2024, at 8:22 AM, Alexandre Rafalovitch wrote: > > If you know you are going to search by it, clone the field without storage > and preprocess to just leave the months behind. That's like 12 possible > values - super e

Re: Keep empty fields in 9.5

2024-04-04 Thread Walter Underwood
term (empty), then there is nothing to have a position. It might be possible to do that with a string field, but this is TextField. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 4, 2024, at 6:38 AM, Carsten Klement > wrote: >

Re: [EXTERNAL] Is this list alive? I need help

2024-02-29 Thread Walter Underwood
://repost.aws/questions/QUqyZD98d0TbiluqPBW_zALw/how-to-get-comparable-performance-to-gp2-gp3-on-efs How to get comparable performance to gp2/gp3 on EFS? repost.aws wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 28, 2024, at 11:18 PM, Gus H

Re: [EXTERNAL] Is this list alive? I need help

2024-02-28 Thread Walter Underwood
node should have its own EBS volume, preferably GP3. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 28, 2024, at 7:51 PM, Beale, Jim (US-KOP) > wrote: > > I did send the query. Here it is: > > http://samisolrcld.aws01.hibu.i

Re: Is this list alive? I need help

2024-02-23 Thread Walter Underwood
First, a shared disk is not a good idea. Each node should have its own local disk. Solr makes heavy use of the disk. If the indexes are shared, I’m surprised it works at all. Solr is not designed to share indexes. Please share the full query string. wunder Walter Underwood wun

Re: Does documentCache still make sense in modern Solr?

2024-02-10 Thread Walter Underwood
was virtual, all bare metal. If you are in a mass market hit-oriented business, document cache might pay off. Where I work now, every client has a different need (legal support), so our cache hit rates are very small. It all comes back to the users. wunder Walter Underwood wun

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Walter Underwood
You seem to be jumping to conclusions about causes. Might want to step back and do some measurements. Try eliminating parts of the query one at a time, including returning fields. You might need to do this with a query set of a few thousand queries to avoid cache effects. wunder Walter

Re: Solr query using full heap and triggers stop the world pause

2024-01-04 Thread Walter Underwood
include an aggregate popularity, for example. Maybe add overall recency. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 4, 2024, at 10:28 AM, rajani m wrote: > > Hi Wunder, > > The base ranker takes care of matching and rankin

Re: Solr query using full heap and triggers stop the world pause

2024-01-04 Thread Walter Underwood
reRankDocs is set to 1000. I would try with a lower number, like 100. If the best match is not in the top 100 documents, something is wrong with the base relevance algorithm. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 4, 2024, at 9:28

Re: List of modules?

2023-12-22 Thread Walter Underwood
documentation, such as it is, is in solr/modules/*/README.md. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 22, 2023, at 2:05 AM, Jan Høydahl wrote: > > Really, if you have a pointing to ../../contrib// that would > translate in

Re: List of modules?

2023-12-20 Thread Walter Underwood
The use for this is migrating from 8.x to 9.x and replacing with modules. Folks need to know which modules replace the directives they are removing. wunder > On Dec 20, 2023, at 8:30 AM, Walter Underwood wrote: > > Is there a list of modules and what they include? It seems scatter

List of modules?

2023-12-20 Thread Walter Underwood
Is there a list of modules and what they include? It seems scattered around the docs. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Java Versions and GC Tuning

2023-12-14 Thread Walter Underwood
product} instead of {product}. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 14, 2023, at 12:07 PM, Shawn Heisey > wrote: > > On 12/14/23 13:04, Shawn Heisey wrote: >> On 12/14/23 12:58, Shawn Heisey wrote: >>> On

Java Versions and GC Tuning

2023-12-14 Thread Walter Underwood
Thanks for the recommendation. Are you running this on Intel or ARM64? We’ve mostly moved to ARM64. —wunder > On Dec 12, 2023, at 9:55 AM, Shawn Heisey wrote: > > Java 11 is a good solid choice. Java 17 seems to perform a little better > than 11 on Solr 9.x, but I haven't actually measured it

Re: browse?

2023-11-21 Thread Walter Underwood
I think the Velocity support is moved to contrib. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 21, 2023, at 4:42 AM, Vince McMahon > wrote: > > Oh, browse endpoint is depreciated... Thanks! > > On Tue, Nov 21, 2023 at 5

Re: TruncateFieldUpdateProcessorFactor isn't being applied

2023-10-24 Thread Walter Underwood
Thanks for confirming. Yes, we’ll use the CloneFieldUpdateProcessor Factory. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 23, 2023, at 11:36 PM, Mikhail Khludnev wrote: > > Hello Walter. > I'm afraid the copyField directive

TruncateFieldUpdateProcessorFactor isn't being applied

2023-10-23 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) smime.p7s Description: S/MIME cryptographic signature

Re: Zk big files issues and model store

2023-10-17 Thread Walter Underwood
. This page has some size comparisons for one data set. https://www.adaltas.com/en/2021/03/22/performance-comparison-of-file-formats/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 17, 2023, at 11:09 AM, Christine Poerschke (BLOOMBERG/ LONDON

Re: Some nodes show high query latency

2023-09-29 Thread Walter Underwood
was because they got allocated on a different EC2 instance type. Oops. We do see some persistent cohorts with different performance, but nothing like 30 ms vs 1000 ms. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 29, 2023, at 9:24 PM, Sh

Re: A general question about update ordering

2023-09-25 Thread Walter Underwood
a real-time get before the update to check whether the document is really there? That should be pretty fast. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2023, at 9:53 AM, Dmitri Maziuk wrote: > > On 9/25/23 08:24, Shawn Hei

Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-14 Thread Walter Underwood
=nmjyrl9z0n92lgidfei45vq4q&dl=0 The collection currently has about 2.5 billion documents. When I worked at Infoseek, our index of the entire web was 12 million documents. This is at LexisNexis. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 13, 2

Re: Join and Distributed Search

2023-09-14 Thread Walter Underwood
a or fail. I >> don't >>> think join query (unless crossCollection) bothers about shard preference. >>> >>> On Wed, Sep 13, 2023 at 7:21 PM Walter Underwood >>> wrote: >>> >>>> We have a sharded collection that joins with a non-s

Re: Join and Distributed Search

2023-09-13 Thread Walter Underwood
ference. > > On Wed, Sep 13, 2023 at 7:21 PM Walter Underwood > wrote: > >> We have a sharded collection that joins with a non-sharded collection. The >> non-sharded collection has a replica on every node. Does the join >> automatically choose the local replica or

Join and Distributed Search

2023-09-13 Thread Walter Underwood
We have a sharded collection that joins with a non-sharded collection. The non-sharded collection has a replica on every node. Does the join automatically choose the local replica or do we need to pass in a shard preference param? wunder Walter Underwood wun...@wunderwood.org http

Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread Walter Underwood
sharding. Double the shards, halve the response time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 13, 2023, at 4:48 AM, Jan Høydahl wrote: > > Hi, > > There are no hard rules wrt sharding, it often comes down to measuring and

Re: Get Circuit Breaker Status

2023-09-08 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 8, 2023, at 7:43 AM, Lyn Evans > wrote: > > We will need to see the status of the Circuit Breaker ( enabled | disabled ) > after this endpoint is invoked. > > The ~/config API will

Re: Compound words in English

2023-08-16 Thread Walter Underwood
ack when Solr was new (version 1.3). The synonyms covered “superman”, “babysitter”, “manhunt”, “fullmetal”, etc. The last was for “Full Metal Jacket” and “Fullmetal Alchemist”. There were about 300 synonyms. You might also need to consider hyphenated versions, like “Spider-man”. wunder Wal

Re: Solr Error| cluster state says we are the leader but locally we don't think so

2023-06-05 Thread Walter Underwood
I’ve seen this kind of thing happen when the overseer is stuck for some reason. Look for a long queue of work for the overseer in zookeeper. I’ve fixed that by restarting the node which is the overseer. The new one wakes up and clears the queue. I’ve only seen that twice. Wunder > On Jun 5, 20

Re: Deleting document on wrong shard?

2023-05-26 Thread Walter Underwood
I wouldn’t call it semantic sugar, more like a different compact format. The compact format also avoids duplicate keys, which are legal in JSON but hard to create in some systems. The XML command format is working fine. wunder Walter Underwood wun...@wunderwood.org http

Re: Deleting document on wrong shard?

2023-05-25 Thread Walter Underwood
.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14234208 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 25, 2023, at 12:19 AM, Thomas Corthals wrote: > > Hi Walter > > Deleting multiple IDs at once with JSON is mentioned here

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
back into a consistent state while we wait for the next full reindex. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 1:44 PM, Shawn Heisey wrote: > > On 5/24/23 10:48, Walter Underwood wrote: >> I think I know how w

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
-006H-40F0-0-00 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 1:13 PM, Ishan Chattopadhyaya > wrote: > > Ah, now I remember this comment: > https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentI

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
Nice catch. This issue looks exactly like what I’m seeing, it returns success but does not delete the document. SOLR-5890 Delete silently fails if not sent to shard where document was added wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
to avoid changing the number of shards without a reindex. One of the other clusters has 320 shards. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 10:12 AM, Gus Heck wrote: > > Understood, of course I've seen your na

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
volumes were mounted for the matching shards. New shards got empty volumes. Then the content was reloaded without a delete-all. Would it work to send the deletes directly to the leader for the shard? That might bypass the hash-based routing. wunder Walter Underwood wun...@wunderwood.org http

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
not an everyday occurrence. I’m trying to clean up the minor problem of 675k documents with dupes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 8:06 AM, Jan Høydahl wrote: > > I thought deletes were "broadcast&quo

Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: deleteById for multiple ids with route parameter

2023-05-11 Thread Walter Underwood
No, Solr Cloud automatically routes it to the correct shard. wunder > On May 11, 2023, at 6:41 PM, Anjali Maurya > wrote: > > But it needs a route parameter to find the right shard from where we need > to delete the document. > > On Tue, May 9, 2023 at 11:24 PM Walt

Re: deleteById for multiple ids with route parameter

2023-05-09 Thread Walter Underwood
Leave off the routing and send multiple IDs. Solr Cloud will route then to the correct shards for you. This is just as fast as Solr Cloud reading the route parameter and sending it to the right shard. The whole point of Solr Cloud is that it manages shards and replicas for you. wunder Walter

Is a reindex required when changing a field to large=true?

2023-04-27 Thread Walter Underwood
We are looking at changing a field property to be large=true. Can we do that without reindexing? Also, I’d appreciate pointers to discussions about the performance implications. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Live shard with no leader

2023-03-22 Thread Walter Underwood
a leader to replicate from. Any ideas on how to unwedge this? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

shards.tolerant and transient failures

2023-03-16 Thread Walter Underwood
://solr.apache.org/guide/8_11/solrcloud-query-routing-and-read-tolerance.html#shards-tolerant-parameter I looked at the original Jira for that, but it is for 4.0 and things have changed just a little bit (https://issues.apache.org/jira/browse/SOLR-3134). wunder Walter Underwood wun

Re: Solr Heap Memory Settings

2023-03-14 Thread Walter Underwood
the sawtooth, then add some headroom, maybe a gigabyte. Test with that value. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 14, 2023, at 7:01 AM, HariBabu kuruva > wrote: > > Hi , > > Till now it was running with 45GB heap m

Re: Failed cores problem

2023-03-10 Thread Walter Underwood
How frequent are your commits? wunder Walter Underwood wun...@wunderwood.org https://observer.wunderwood.org/ (my blog) > On Mar 10, 2023, at 12:27 AM, Hakan Özler wrote: > > Regarding the problem, we're able to mitigate it by increasing the time > between > commits to the

Re: Solr Heap Memory Settings

2023-03-09 Thread Walter Underwood
Use a heap analysis tool. You’ll see a sawtooth pattern in the heap size. The bottom of that sawtooth is the actual amount of memory that Solr is using. Pick the highest point of the bottom of the sawtooth, then add some headroom, maybe a gigabyte. Test with that value. wunder Walter Underwood

Re: Delete silently failing.

2023-03-07 Thread Walter Underwood
Is it supposed to be: {“delete”: {“id”: "1E089335-892C-41F6-B767-632EB5361775”}} wunder Walter Underwood wun...@wunderwood.org https://observer.wunderwood.org/ (my blog) > On Mar 7, 2023, at 1:20 PM, Thomas Corthals wrote: > > Got blindsided by the quotes and didn't noti

Re: Suggester index replication

2023-03-02 Thread Walter Underwood
up an extra downstream machine to play with until you get it right. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 2, 2023, at 10:42 AM, gnandre wrote: > > Thanks! I am using non-cloud mode at the moment. So, there is no way to > j

Re: Suggester index replication

2023-03-02 Thread Walter Underwood
You need to send a build request to each node. I used to have some code to dig out the nodes from a cluster status, then send a build to each one, but I think that is marooned at my previous company. It isn’t super hard, just dig it out of the JSON. wunder Walter Underwood wun

Re: Solr Query not always returning correct results

2023-02-17 Thread Walter Underwood
 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 17, 2023, at 11:52 AM, Mark Hieber wrote: > > We have a cluster of hosts running Solr 8.4 Each host has an application > which listens to an external source for updated documents. Wh

Re: Solr Master Reboot

2023-02-03 Thread Walter Underwood
Just reboot it. Solr will shut down all connections, interrupting any in-progress replication. The replication will be retried after it starts back up. Failure of the master during replication has been safe for many years. wunder Walter Underwood wun...@wunderwood.org http

Re: Inconsistent ordering of results

2023-01-11 Thread Walter Underwood
consistent ordering. Exact score ties are common with one word queries and short documents, like book or movie titles. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 11, 2023, at 4:10 AM, Peter Lancaster > wrote: > > Hi Mikhail, > &

Re: Maximum number of shards and nodes in solr cloud

2023-01-09 Thread Walter Underwood
single cloud. Any suggestions? It was challenging to manage with 8 shards and a replication factor of 8. At that point, we scaled vertically to bigger AWS instances. It scaled smoothly up to 72 CPU instances. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Maximum number of shards and nodes in solr cloud

2023-01-09 Thread Walter Underwood
single cloud. Any suggestions? It was challenging to manage with 8 shards and a replication factor of 8. At that point, we scaled vertically to bigger AWS instances. It scaled smoothly up to 72 CPU instances. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Slowness when searching in child documents.

2023-01-05 Thread Walter Underwood
update, but those shouldn’t be frequent. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Multiple cores

2022-12-28 Thread Walter Underwood
invented by Infoseek. That patent expired several years ago, so we should implement it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 28, 2022, at 5:35 AM, Eric Pugh > wrote: > > For a very long time, that was what folks always

Re: spatial search by zipcode

2022-12-14 Thread Walter Underwood
neighboring zip codes would be to find what DMA (direct marketing area) the address is in, then find the zip codes that are in that DMA. This thread has some relevant discussion: https://www.reddit.com/r/adops/comments/oxdthy/zip_code_to_dma_converter/ wunder Walter Underwood wun...@wunderwood.org

Re: spatial search by zipcode

2022-12-14 Thread Walter Underwood
/data=!4m5!3m4!1s0x80ba30d165da8f09:0xaf8f27eb9fd93664!8m2!3d38.3675335!4d-115.9467997 What does the current API do, exactly? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 14, 2022, at 9:37 AM, dmitri maziuk wrote: > > On 2022-12-14

Re: Slowness in Solr Optimize

2022-12-13 Thread Walter Underwood
than the caching in a single Solr server. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 13, 2022, at 11:27 AM, David Hastings > wrote: > > Ah, that makes sense. If you can do sticky sessions and such with your > balancers, plus

Re: Slowness in Solr Optimize

2022-12-13 Thread Walter Underwood
could send the same query back to the same host, but AWS load balancers aren’t very smart. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 13, 2022, at 3:50 AM, Dave wrote: > > Ha I meant qtimes not atone. Also in general you should

Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Walter Underwood
If you want apple OR pear, use: myField:apple myField:pear If you want apple AND pear, use: +myField:apple +myField:pear wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 9, 2022, at 9:22 AM, Matthew Castrigno wrote: > > I am havin

Re: Is there a way to run the entire payload of a request through a charFilter and not just the fields?

2022-11-28 Thread Walter Underwood
format. https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-update-handlers.html#json-formatted-index-updates wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 28, 2022, at 2:59 PM, Matthew Castrigno wrote: > > Thank you

Re: Is there a way to run the entire payload of a request through a charFilter and not just the fields?

2022-11-28 Thread Walter Underwood
,\"Date\":\"2022-10-03T12:30:17.3388537\",\"ContentType\":\"Blog\",\"Body\":{\"Fields\":[{\"Name\":\"Heading Background Image\",\"Type\":\"Image\",\"Value\":\"\”},... would add fields like

Re: Is there a way to run the entire payload of a request through a charFilter and not just the fields?

2022-11-28 Thread Walter Underwood
That is invalid JSON. The client needs to fix it. I’m surprised it indexes at all. This should not be your problem. Past that string into this: https://jsonlint.com wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 28, 2022, at 12:57 PM, Matt

Re: Doubt about number of shards

2022-11-16 Thread Walter Underwood
. That is more predictable. wunder Walter Underwood wun...@wunderwood.org <mailto:wun...@wunderwood.org> http://observer.wunderwood.org/ (my blog) > On Nov 16, 2022, at 3:55 AM, Jan Høydahl <mailto:jan@cominvent.com>> wrote: > > Also see the Ref Guide about Request

Re: Doubt about number of shards

2022-11-15 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org <mailto:wun...@wunderwood.org> http://observer.wunderwood.org/ (my blog) > On Nov 15, 2022, at 3:49 AM, DAVID MARTIN NIETO <mailto:dmart...@viewnext.com>> wrote: > > hello solr users > > We have a production cluster

Re: 8.11 docs "Sending JSON Update Commands" bug?

2022-10-31 Thread Walter Underwood
ng of name/value pairs.” https://www.ecma-international.org/publications-and-standards/standards/ecma-404/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 31, 2022, at 1:42 PM, Adam Constabaris > wrote: > > I don't know if there&

Re: Advice in order to optimise resource usage of a huge server

2022-10-06 Thread Walter Underwood
Run a GC analyzer on that JVM. I cannot imagine that they need 80 GB of heap. I’ve never run with more than 16 GB, even for a collection with 70 million documents. Look at the amount of heap used after full collections. Add a safety factor to that, then use that heap size. wunder Walter

Re: Advice in order to optimise resource usage of a huge server

2022-10-06 Thread Walter Underwood
speed and capacity of the disk system. If the index does fit in RAM, then you should be fine. You may want to spend some effort on reducing index size if it is near the limit. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 6, 2022, at 8:18

Re: Solr Search - Mixed Case Issue

2022-09-27 Thread Walter Underwood
was a movie titled “+/-“, but that is a different problem. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 27, 2022, at 12:49 PM, Miguel Joy > wrote: > > Hi Walter, > > Thanks very much for your honest feedback. As I ment

Re: Solr Search - Mixed Case Issue

2022-09-27 Thread Walter Underwood
Honestly, this analysis chain is a mess. * StandardTokenizer has parsing support for email addresses, so that is a better choice. * Never mix phonetic transformation and stemming, use different chains. Phonetic tokens aren’t stemmable. * Don’t stem email addresses. * Don’t do phonetic transforms

Re: The logging of Solr queries

2022-09-22 Thread Walter Underwood
I’ve always used the HTTP (access) log. In that, queries to shards are POST requests, so if the external requests are all GET, they are easy to sort out. wunder Walter Underwood https://observer.wunderwood.org/ > On Sep 22, 2022, at 7:02 PM, Shawn Heisey wrote: > > On 9/22/22 09:1

Re: Identifying SOLR [query] performance issue (or "how to scale up")

2022-09-21 Thread Walter Underwood
In the real world, many queries are repeated, so it is best to replay logged queries keeping all the dupes. wunder Walter Underwood https://observer.wunderwood.org/ > On Sep 21, 2022, at 4:31 PM, Derek C wrote: > > Thanks Deepak, > > I'm going to do more testing and c

Re: MoreLikeThis with externally supplied text, and facets?

2022-09-09 Thread Walter Underwood
I made this work with 6.x but don’t remember the details, sorry. I think it wanted application/something, maybe the POST format. wunder > On Sep 9, 2022, at 1:38 PM, Mikhail Khludnev wrote: > > Hold on. JSON query DSL lets you pass quite long content via body. It > should support {!mlt}. At

Re: Autoscaling

2022-07-17 Thread Walter Underwood
just aren’t designed for persistent data. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 17, 2022, at 7:58 AM, Dave wrote: > > Three nodes with nginx in front will handle well over 50k searches a day on a > half terabyte index,

Re: using childFilter to restrict "child" docs by "grandchild" information

2022-06-29 Thread Walter Underwood
change? Not very often, I bet. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2022, at 12:50 AM, Noah Torp-Smith wrote: > > Interestingly, I found that > > [child childFilter=$pidfilter limit=-1]&pidfilter=+instance.agen

Re: Using / searching fields with "structure"

2022-06-01 Thread Walter Underwood
200k documents, but response times were well under 100 ms, as I remember. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 1, 2022, at 3:00 PM, Christopher Schultz > wrote: > > All, > > Since Solr / Lucene can't def

Re: SolrJ compatibility

2022-05-31 Thread Walter Underwood
We had one 4.x cluster that was difficult to migrate, so until recently we were using SolrJ 4.x with our new Solr 8.7 clusters and with a Solr 4.10.4 cluster. We were not doing anything fancy like faceting. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

  1   2   >