Re: Solr with smallest possible amount of RAM

2024-12-08 Thread Dave
Using an ssd would be kind of like replacing the disk with ram and run relatively fast, > On Dec 8, 2024, at 1:21 AM, ufuk yılmaz wrote: > > There is a demo/trial environment where I’m trying to run many very small > Solrs, each with a single core and only a few documents. > > I limited the

Re: Sharing a post: "How to fork: Best practices and guide"

2024-11-26 Thread Dave
I used to fork my solr indexer across 64 cpu cores, memory consumption was my major issue so just threw ssds and ram at the issue, 500 gb index post a commit followed by an optimize later it worked fine. Obviously the last commit was heavy but it didn’t need to be real time so I had that advan

Re: Copy-field doesn't seem to be working as expected

2023-05-20 Thread Dave
done) Hope it works, look forward to the follow up Dave > On May 20, 2023, at 1:53 PM, Shawn Heisey wrote: > > On 5/19/23 15:39, Christopher Schultz wrote: >> Please confirm the following: >> 1. Solr index is created with Solr 7.something >> 2. Solr 8.x is deployed

Re: becoming a solr specialist

2023-05-04 Thread Dave
Send me a personal email > On May 4, 2023, at 11:23 AM, ufuk yılmaz wrote: > > Hi all, > > First of all forgive me if asking this here is inappropriate, but I couldn’t > think of a better place where all Solr experts gather. > > I have been working as the main “solr person” at a project sinc

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Dave
to Make sure you are getting what you want. -Dave > On May 2, 2023, at 2:22 PM, Bill Tantzen wrote: > > I'm using the solrconfig.xml from the distribution, > ./server/solr/configsets/_default/conf/solrconfig.xml > > But this problem extends to the index as well; us

Re: Suggestions to improve Star queries latencies

2023-04-18 Thread Dave
I think there are more important questions here. What do you want with a *:* query? Do you want all the results in on return? Or do you just want the count of total documents? Or to put the results in facets? *:* should never take long unless you are requesting every single document not just

Re: How to use MorelikeThis with duplicates

2023-04-12 Thread Dave
The recent flag is super clever, and you can use it on other applications/situations as well. I would do that in a heartbeat assuming you can reindex your data set quickly > On Apr 12, 2023, at 10:49 AM, Alessandro Benedetti > wrote: > > Following up on Mikhail good insights, > I would prob

Re: Solr Heap Memory Settings

2023-03-13 Thread Dave
te. Test with that value. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> https://observer.wunderwood.org/ (my blog) >> >>>> On Mar 9, 2023, at 5:23 AM, Dave wrote: >>> >>> Agreed, but often times as a developer you a

Re: Apache Solr Neural Search Training

2023-03-10 Thread Dave
I am definitely interested in this > On Mar 10, 2023, at 7:48 AM, Alessandro Benedetti > wrote: > > Given it's almost time for an upcoming live training of ours, I take the > occasion for a bit of self-promotion :) > > The 16th of March we host the Neural Search training for Apache Solr: > >

Re: Solr Heap Memory Settings

2023-03-09 Thread Dave
f > Solr to lower the footprint than to add >30g. > > Jan > >> 9. mar. 2023 kl. 12:52 skrev Dave : >> >> Again, set to less than 32, I liked 30 >> >>>> On Mar 9, 2023, at 1:04 AM, Deepak Goel wrote: >>> >>> The max

Re: Solr Heap Memory Settings

2023-03-09 Thread Dave
B, and only increase it if there is enough memory pressure. One thing >> that I don't know is whether Java will use the 32 bit pointers with the >> Xmx at 40g. It probably won't, so I expect that memory usage would be >> more efficient if you set the max heap to 31g. &g

Re: Solr Heap Memory Settings

2023-03-08 Thread Dave
-Xms3M -Xmx3M Keep them the same, no spaces, I preferred to use M , never go above 32g cause reasons (jvm gets weird after 32) and make sure your machine still has the memory to hold your index. > On Mar 8, 2023, at 11:27 AM, HariBabu kuruva > wrote: > > Hi All, > > I have set the

Re: When to index data into Solr?

2023-01-29 Thread Dave
And make sure you can always reindex the entire data set at any given moment. Solr/search isn’t meant to be a data store nor reliable. It should be able to be destroyed and recreated when ever needed. > On Jan 29, 2023, at 1:53 PM, marc nicole wrote: > > so to sum up, it's indexation at data

Re: Solr Query time performance

2023-01-29 Thread Dave
You can have 40+ million documents and half a terabyte index size and still not need spark or solr cloud or sharding and get sub second results. Don’t over think it until it becomes a real issue > On Jan 29, 2023, at 1:53 PM, marc nicole wrote: > > Much appreciated. > >> Le dim. 29 janv. 20

Re: General question about high availability.

2023-01-17 Thread Dave
Put an nginx front for about three solr servers that does a drop down failover. You always want one to be the primary for caching and that few searches, then drop down to the other couple on failure > On Jan 17, 2023, at 12:07 PM, Matthew Castrigno wrote: > >  > What is the best approach for

Re: Importing Data from MySql

2023-01-13 Thread Dave
n batches. It'll be faster than > sending each document in a separate request. > > Op vr 13 jan. 2023 om 16:41 schreef Dave : > >> Yeah, it’s trivial building your own indexer in any language that can read >> a db. Also I wouldn’t trust the dih on its own even when suppor

Re: Importing Data from MySql

2023-01-13 Thread Dave
Yeah, it’s trivial building your own indexer in any language that can read a db. Also I wouldn’t trust the dih on its own even when supported > On Jan 13, 2023, at 10:17 AM, Jan Høydahl wrote: > > I don't think the 3rd party DIH is maintained. > > Other options are using other 3rd party fram

Re: Quoted phrase doesn't match when stemming and synonyms combined.

2023-01-12 Thread Dave
ucene/issues/12080 > I found a small change in code that seem to fix the problem. > Thank you Dave for the feedback! > > W dniu 11.01.2023 o 15:17, Dave pisze: >> On one hand that’s great news, on the other ot probably deserves a ticket >> but you need to have a very sp

Re: Quoted phrase doesn't match when stemming and synonyms combined.

2023-01-11 Thread Dave
document matches, as expected. > > Still, it looks like SGF was designed to work well when used only in query, > and it's just a bug revealed by an edge case. Shall I submit an issue to > https://github.com/apache/lucene ? > > W dniu 11.01.2023 o 13:09, Dave pisze: >

Re: Quoted phrase doesn't match when stemming and synonyms combined.

2023-01-11 Thread Dave
ote: > > W dniu 11.01.2023 o 12:04, Dave pisze: >> Hmm. As an experiment what happens when you use a range of three or four >> with the quotes using the tilda in the query? > > You mean query like "test polskie"~1 ? Yes, it does match. > > Unfortunately it's

Re: Quoted phrase doesn't match when stemming and synonyms combined.

2023-01-11 Thread Dave
Hmm. As an experiment what happens when you use a range of three or four with the quotes using the tilda in the query? Also generally o find it best to use the same filters for both indexing and query, just a personal preference, I know it’s not always possible however. > On Jan 11, 2023, at 5

Re: Multiple cores

2022-12-28 Thread Dave
Eric, that is super clever. But how does it effect ranking if you do a general search? Since each collection has its own idf etc? -Dave > On Dec 28, 2022, at 7:03 AM, Eric Pugh > wrote: > > You may find it an easier path forward to just move to SolrCloud. You can > ru

Re: Compiling and running solr locally on mac

2022-12-24 Thread Dave
That’s awesome. Also Perl should be on every Unix system. Personally I used homebrew and it was super fast and easy to get it up and going > On Dec 24, 2022, at 3:28 PM, Somnath Kumar wrote: > > Thank you Shawn. Just tried this and it worked! > > Som > >> On Mon, Dec 19, 2022 at 9:28 PM Shaw

Re: Slowness in Solr Optimize

2022-12-13 Thread Dave
Sounds like you should contact aws about it since it’s not a solr issue if the qtimes haven’t increased in the solr logs. And again, don’t load balance but that’s my personal opinion > On Dec 13, 2022, at 6:50 AM, Pradeep wrote: > > Hi, > > I cant change it to NLB at this moment, firstly w

Re: Slowness in Solr Optimize

2022-12-13 Thread Dave
Ha I meant qtimes not atone. Also in general you shouldn’t use a load balancer with solr, since you won’t be able to keep the index hot and n memory for each subsequent query if you are paging through results. The best way in my experience is to have failovers for your nodes, instead of load ba

Re: Slowness in Solr Optimize

2022-12-12 Thread Dave
You can check the atones to see if solr itself actually slowed down. As solr has nothing to do with a load balancer I doubt it has. Also you used a sentence that concerns me, clearing out the deleted documents, which sounds like an optimize command. You as a user should never use that, let sol

Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Dave
Try adding each value separately. Not joined in code, let solr do the multivalue work, > On Dec 9, 2022, at 1:11 PM, Matthew Castrigno wrote: > >  > Thank you for your comments that appears to be the root of the problem. > > Fixing it raises another question. > > The incorrect multivalued f

Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Dave
"apple, pear" That looks like a string not a multi valued field to me. Maybe I’m wrong but you should have quotes around each element of the array > On Dec 9, 2022, at 12:23 PM, Matthew Castrigno wrote: > > "apple, pear"

Re: Duplicate docs with same unique id on update

2022-12-09 Thread Dave
So it was a decision to remove the unique field id and replace it with root? This seems, bad. You can’t have two documents with the same id/unique field. > On Dec 9, 2022, at 7:57 AM, Jan Høydahl wrote: > > Hi, > > So to be clear - you have a working fix by adding the _root_ field to your

Re: Near Real Time not working as expected

2022-12-07 Thread Dave
Just out of curiosity are you using metal? And if so ran any disk io tests to see if you may have a hardware problem on any of the nodes? A document won’t be available until all the nodes have it so it just takes one to get slow to slow you down > On Dec 7, 2022, at 9:45 AM, Matias Laino > w

Re: I cannot get nested objects to index - with image links

2022-10-27 Thread Dave
Well honestly it’s more or less implied that if a field is declared required, it’s required in all documents, parent or children. Perhaps an inherit field would have been applicable if such exists(I don’t think so) and it’s documented quite clearly here: https://solr.apache.org/guide/solr/late

Re: Script Update Processor

2022-10-14 Thread Dave
No one should ever actually use a .0 version > On Oct 14, 2022, at 8:41 AM, Matthew Castrigno wrote: > > This issue is easily reproduced in 9.0 using the example script and logging > cmd.solrDoc in the processAdd function. > > From: Eric Pugh > Sent: Friday, O

Re: Solr 6 Replication question

2022-10-11 Thread Dave
backup, as a single server can cache the fields way faster than round robin or whatever other metric uses to determine who serves. -Dave > On Oct 11, 2022, at 1:32 PM, mtn search wrote: > > Thanks Dave! Yes, we ran into this issue yesterday and do need to review > the disk s

Re: solr 9 standalone crashed after few hours - PhaseIdealLoop::build_loop_late_post_work

2022-10-10 Thread Dave
I won’t say for certain as I have never seen this but this seems like a garbage collection situation. Look there first to see if you can cancel that out as the cause > On Oct 10, 2022, at 5:59 PM, Jen-Ya Ku wrote: > >  > Hi all, > > We've deployed solr9 on OpenJDK 17 and it crashed after f

Re: Solr 6 Replication question

2022-10-10 Thread Dave
needs to be ready for triple the size. If you don’t have the disk space ready to handle this you’re going to eventually run into some serious issues, or just not be able to replicate -dave > On Oct 10, 2022, at 2:56 PM, mtn search wrote: > > As I go back through > https://sol

Re: Node backup using replication

2022-10-10 Thread Dave
Exactly. In linux I would just do a 777 for such a directory anyways since no one outside of the machine can get to it since no solr servers should have public ip. > On Oct 10, 2022, at 12:51 PM, Shawn Heisey wrote: > > On 10/10/22 09:23, Joe Jones (DHCW - Software Development) wrote: >> ja

Re: Advice in order to optimise resource usage of a huge server

2022-10-06 Thread Dave
You should never index directly into your query servers by the way. Index to the indexing server and replicate out to you query servers and tune each as needed > On Oct 6, 2022, at 6:52 PM, Dominique Bejean > wrote: > > Thank you Dima, > > Updates are highly multi-threaded batch processes a

Re: Advice in order to optimise resource usage of a huge server

2022-10-06 Thread Dave
I know these machines. Sharding is kind of useless. Set the ssd tb drives up in fastest raid read available, 31 xms xmx, one solr instance. Buy back up ssd drives when you burn one out and it fails over to the master server. Multiple solr instances on one machine makes little sense unless they h

Re: Fastest way to index data to solr

2022-09-30 Thread Dave
I don’t have any tests but I know anything is faster than xml. You may as well stick to text files. Xml is garbage that’s why they made yaml which is the parent of json > On Sep 30, 2022, at 3:47 AM, Thomas Corthals wrote: > > Hi Gus, > > I have a followup question. Is JSON parsed faster tha

Re: Fastest way to index data to solr

2022-09-29 Thread Dave
Another way to handle this is have your indexing code fork out to as many cores as the solr indexing server has. It’s way less work to force the code to run itself that many times in parallel, and as long as your sql queries and said tables are properly indexed the database shouldn’t be a bottle

Re: Loading solr.xml from zookeeper

2022-09-21 Thread Dave
Is there a trusted guide for running solr in docker out there? I’ve seen a few but just wondering if you got one you like the most > On Sep 21, 2022, at 1:32 PM, David Smiley wrote: > > ANNAMANENI: can you clarify what you mean by "multiple repositories"; maybe > "repositories" is a word wit

Re: Solr : how to get the frequency of an expression

2022-09-16 Thread Dave
How fast can you rebuild your index? If it’s trivial make a new field for that field and utilize shingles with a two term specification and you *should be able to get what you want but I can’t test it right now, but in theory it would work > On Sep 16, 2022, at 1:39 PM, Audrey Tesrin wrote: >

Re: Using substring functionality to reach field value in solt

2022-09-15 Thread Dave
You would need to do that in the code end of reading the document from the index. Search indexes assume you want the complete value they don’t give substrings, > On Sep 15, 2022, at 9:59 AM, Shankar R wrote: > > Hi All, > My solr field is defined like this > > ="true" multiValued="false" /

Re: Allow anonymous search on otherwise Basic Auth-protected Solr instance?

2022-09-02 Thread Dave
It’s more or less an understood paradigm. User->app->vpn/internal network->solr and back. > On Sep 2, 2022, at 2:10 PM, Victoria Stuart (VictoriasJourney.com) > wrote: > > Good points, re: Solr security. Solutions, references?

Re: Allow anonymous search on otherwise Basic Auth-protected Solr instance?

2022-09-02 Thread Dave
Exactly. This is a serious security loophole you would be opening up. What if I just ask for *:* and 5 rows to just, take all of your data, while crashing your server, and just keep doing it in 20 simultaneous calls until it dies, and even if you wake it up I’ll just turn it back on and

Re: Ranking based on number of OR clauses matched

2022-08-26 Thread Dave
Why is your qf set to only those two fields and not the subject? Also in the qf you can boost them. The filter query has no effect on the score, it just eliminates documents that don’t meet your query > On Aug 26, 2022, at 7:55 AM, Noah Torp-Smith wrote: > > OK, I've narrowed it down a bit.

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Dave
an using the qf query parameter or > setting up separate "parallel" fields of some sort? > > Best, > > Morten > >> On Tue, 23 Aug 2022 at 17:29, Dave wrote: >> >> Ok so from what I’m looking at you have a proximity search so the terms >> hav

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread Dave
Ok so from what I’m looking at you have a proximity search so the terms have to be within the distance value of each other. In my example, 2, which obviously won’t work since there are three terms. A fuzzy search is based on a single term/token. So you need to add ~2 to each term if that’s what

Re: solr backup is not working in 8.9 version.

2022-08-09 Thread Dave
Yeah you can’t post images, just the actual error itself from the interface or the cli text > On Aug 9, 2022, at 9:03 AM, Deepak Goel wrote: > > Can't see the error. > >> On Tue, 9 Aug 2022, 18:22 Naresh Lunavath, >> wrote: >> >> Hello Team, >> >> Recently, we have upgraded solr from 7.7 t

Re: Nested objects storing

2022-08-09 Thread Dave
searching for and what you want to facet against (same thing) to come up with the other fields. -Dave > On Aug 9, 2022, at 7:18 AM, Eric Pugh wrote: > > I feel like the Solr Ref Guide ought to weigh in on this ;-).I’ll be > curious what other folks say? > > I know

Re: solr backup location 8.11.1

2022-08-05 Thread Dave
ote: > > Actually, soft links won't work either, because the snapshots aren't in a > subdirectory of data, and each one has a different name. > > Cron on ec2 is a bit of a pain, but yes, that does seem like the > best solution available. > >> On Fri, Aug 5, 2022 at 11:

Re: solr backup location 8.11.1

2022-08-05 Thread Dave
Can’t you just make a cron job that runs an sh file that does a cp-rf on the data folder with a time stamp? The indexes are drop in when needed > On Aug 5, 2022, at 12:07 PM, Thomas Woodard wrote: > > That is exactly what I was afraid of. Not being able to configure where > automated backups

Re: Solr update only if field differs

2022-08-04 Thread Dave
—— At this point it would be interesting to see how this Processor would increase the indexing performance when you have many duplicates - when it comes to indexing performance with duplicates, there isn’t any difference than a new document. It’s mark as original destroyed, and new one replaces

Re: OR clause in json post request

2022-07-26 Thread Dave
Once you introduce an AND with an or condition logic starts getting funky. But hard to tell without the actual queries > On Jul 26, 2022, at 8:34 PM, Samuel Gutierrez > wrote: > > I am working on a json post request where I need to mix AND and OR clauses > for example: > Condition1 AND Condi

Re: Retain Data Import Handler In Solr9.0

2022-07-22 Thread Dave
Oh look into perls fork manager module, https://metacpan.org/pod/Parallel::ForkManager . Only trick is each time it spawns a process you have to redeclare the dbh and any stored procedures but it’s a small price to pay for being able to simply adjust the number of parallel jobs it will do

Re: Retain Data Import Handler In Solr9.0

2022-07-22 Thread Dave
Not to mention using dynamic fields on the fly in the indexer, applying code logic to the documents and just having full control over it has a lot of benefits to the point that a DIH was a cute idea when it came out but it reality it was just hand holding > On Jul 22, 2022, at 2:19 PM, dmitri m

Re: Autoscaling

2022-07-17 Thread Dave
Well to start you should just have one shard. 1 million documents is barely anything justifying sharding it out. So it’s really quite easy to balance one shard and one server > On Jul 17, 2022, at 1:26 PM, Kaminski, Adi > wrote: > > So what would be the recommendation then to have balanced s

Re: Autoscaling

2022-07-17 Thread Dave
oblems when you put them into service. > > Yes, agree about containers. Containers are great for CPU-only applications. > They just aren’t designed for persistent data. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog)

Re: Autoscaling

2022-07-17 Thread Dave
Three nodes with nginx in front will handle well over 50k searches a day on a half terabyte index, but only one node is to serve the searches the rest are backups. I would never put solr in a container > On Jul 17, 2022, at 10:44 AM, Shawn Heisey wrote: > > On 7/17/22 07:40, Ronen Nussbaum w

Re: Unsubscribe me

2022-07-13 Thread Dave
To be clear I don’t want to unsubscribe > On Jul 13, 2022, at 5:18 AM, Boitumelo Molekwa > wrote: > > Also me please > > -Original Message- > From: Nicolas Franck > Sent: 12 July 2022 10:29 PM > To: users@solr.apache.org > Subject: Unsubscribe me > > > CAUTION: This email originat

Re: Unsubscribe me

2022-07-12 Thread Dave
Can I unsubscribe anyone with a certain string in their email? Anything with hein should no longer be subscribed > On Jul 12, 2022, at 6:23 PM, Gus Heck wrote: > > Have you checked that the mails are actually going to the address you are > unsubscribing from? If you are getting mails forwarde

Re: Transfer to a new server

2022-07-11 Thread Dave
Ideally yes, but I feel that pain of trying to rebuild a 500gb index and another 500 of stored full text along side of it, plus another couple hundred gigs of a supporting index, and all from different sources. But yes it should be able to be done in a week or two. I never upgrade the solr se

Re: Transfer to a new server

2022-07-11 Thread Dave
You could put it on a hard drive and mail it. Or just, over the internet using replication > On Jul 11, 2022, at 7:45 AM, Thomas Corthals wrote: > > Hello Mike, > > If possible, just rebuild it from the original source on the new server. > > Regards, > > Thomas > > Op ma 11 jul. 2022 om

Re: Solr eats up all the memory

2022-07-06 Thread Dave
Not sure about ku but docker you can simply mount the ssd into the service as an alias with the volumes. Unless you have no control over the metal then this could work? > On Jul 6, 2022, at 8:34 PM, dmitri maziuk wrote: > > On 2022-07-06 2:59 PM, Shawn Heisey wrote: >> If the mounted filesy

Re: Solr eats up all the memory

2022-07-06 Thread Dave
In my experience yea it will just be slow, but it’s hard to test truthfully slow without a couple tens of thousands of searches to measure against. It won’t fail fail, just read the disk. So. Get an ssd to put the index on and then poof, you have a really fast disk to read from > On Jul 6, 202

Re: Solr eats up all the memory

2022-07-05 Thread Dave
com/deicool >> LinkedIn: www.linkedin.com/in/deicool >> >> "Plant a Tree, Go Green" >> >> Make In India : http://www.makeinindia.com/home >> >> >>> On Tue, Jul 5, 2022 at 4:43 PM Dave wrote: >>> >>> Exactly. You could have the be

Re: Solr eats up all the memory

2022-07-05 Thread Dave
you have tuned the software to a point where you can't >>> tune >>>> anymore, you can then turn your eyes to hardware. >>>> >>>> Deepak >>>> "The greatness of a nation can be judged by the way its animals are >>> tre

Re: Solr eats up all the memory

2022-07-04 Thread Dave
Also for $115 I can buy a terabyte of a Samsung ssd, which helps a lot. It comes to a point where money on hardware will outweigh money on engineering man power hours, and still come to the same conclusion. As much ram as your rack can take and as big and fast of a raid ssd drive it can take. Re

Re: Regarding Solr auto recovery

2022-06-22 Thread Dave
Theoretically if this script gets executed, solr is already dead and the memory is retrieved > On Jun 22, 2022, at 11:04 AM, Poorna Murali wrote: > > Thanks Shawn for the clarification! > >> On 2022/06/22 14:03:43 Shawn Heisey wrote: >>> On 6/22/22 04:40, Poorna Murali wrote: >>> Thanks every

Re: Semantic Knowledge Graph theoric question

2022-06-22 Thread Dave
If you really want to have fun you build that index using the significant phrases plus the ner and boost accordingly and I have about 90% certainty if you do it well, you hit the mark. Amhik > On Jun 22, 2022, at 10:08 AM, Dave wrote: > > This is the right answer. I could go more

Re: Semantic Knowledge Graph theoric question

2022-06-22 Thread Dave
/guide/8_9/stream-source-reference.html#significantterms-parameters > > > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > >> On Wed, Jun 22, 2022 at 2:37 AM Danilo Tomasoni wrote: >> >> Hello Dave, first of all thank you for

Re: Semantic Knowledge Graph theoric question

2022-06-21 Thread Dave
Two hints. The ner from solr isn’t very good, and the relatedness function is iffy at best. I would take a different approach. Get the ner data as you have it now and use shingles to make a better formed complete index using stop words then use the mlt mech to see if it’s better. If it is, g

Re: Auto recovery of solr

2022-06-21 Thread Dave
In my experience if solr goes down it’s because it ran out of disk space, so if you automatically just bring it back up again it will just go down again. There are simple bash scripts you can make to run for standalone solr that will do what you want, you just need to be sure they destroy and ch

Re: Re-index after upgrade

2022-06-12 Thread Dave
You don’t need a new core/collection, just reindex everything again. Ideally since you’re using standalone (way better than cloud imo) you can use the same indexer, just do an integrity check after the fact to make sure the document counts are the same. You don’t really need to do that delete if

Re: Facet counts for first N hits

2022-06-03 Thread Dave
Yeah, my first thought would be to have the first query with no facets and a fl of just the id, limited to 1000, it’s a lot faster than you think if you only return the id and no facets. Then do a secondary search for just those ids and the facets added to that query using terms component as you

Re: Solr becomes extremely slow after moving to another machine

2022-06-02 Thread Dave
What are you doing to warm up the new server? Need to get that index into memory with roughly the same queries you have on the other machine, for a bit. > On Jun 2, 2022, at 11:32 AM, Chang Wang wrote: > > Hi All, > > I have a machine (EC2 instance) on AWS with solr 6.6 installed. I recently

Re: Solr Highlighting full phrases instead of words

2022-05-20 Thread Dave
Did you try mergeContiguous yet and see if it produced what you wanted? > On May 20, 2022, at 8:19 AM, Endika Posadas wrote: > > Hi all, > > I am using Solr's Unified highlighter to highlight parts of a text block. > However, I have noticed that the highlighter, instead of highlighting the >

Re: Is solr what I want, or something else?

2022-04-17 Thread Dave
Solr can easily do what you want if I understand you correctly. Key terminology to use would be “document” for the expected items your search would return, in your case sounds like the folder with the text files, “fields” being the metadata points for each document, in your case sounds like text

Re: Regarding indexing data in different cores or same core with different entities.

2022-04-10 Thread Dave
This is a good place to use a filter query as well, especially if you want results from any combination of the tables > On Apr 10, 2022, at 5:05 PM, Saurabh Sharma > wrote: > > In case you are having very less data in tables then you should index all > four tables in a single core. With every

Re: Solr as a dedicated data store?

2022-04-07 Thread Dave
This is one of the most interesting and articulate emails I’ve read about the fundamentals in a long time. Saving this one :) > On Apr 7, 2022, at 9:32 PM, Gus Heck wrote: > > Solr is not a "good" primary data store. Solr is built for finding your > documents, not storing them. A good primary

Re: Solr Cloud - Query with results around 2 million records time out.

2022-04-05 Thread Dave
I’ve been able to download a response from standalone solr with over 40 million records, just takes a bit, using wget and a long timeout. I don’t know if a browser would be able to handle that size and time to download, let alone crash the browser altogether > On Apr 5, 2022, at 8:00 AM, Thomas

Re: Solr as a dedicated data store?

2022-04-04 Thread Dave
NO. I know it’s tempting but solr is a search engine not a database. You should at any point be able to destroy the search index and rebuild it from the database. Most any rdbms can do what you want, or go the nosql mongo route which is becoming popular, but never use a search engine in this w

Re: Problem with facet in SOLR

2022-04-03 Thread Dave
Other things to consider, without seeing your raw query, is make sure facet=true is in it, and ideally for facets you want a string field rather than text and I docvalues/stored being true, then rerun a sample index and test again. Also facets work on dynamic fields as well, I don’t. Know if doc

Re: Solr dashboard - number of CPUs available

2022-03-18 Thread Dave
Again, never ever trust the result speed of a cold search. Are you warming your index? https://solr.apache.org/guide/6_6/query-settings-in-solrconfig.html > On Mar 18, 2022, at 4:23 PM, Vincenzo D'Amore wrote: > > perSegFilter > class:org.apache.solr.search.LRUCache > description:LRU Cache(m

Re: Solr dashboard - number of CPUs available

2022-03-18 Thread Dave
quot;, > "rows":"1"}}, > > then for a while, the QTime is 0. I assume (obviously) that it is cached, > but after a while the cache expires > >> On Fri, Mar 18, 2022 at 6:22 PM Dave wrote: >> >> I’ve found that each solr instance

Re: Solr dashboard - number of CPUs available

2022-03-18 Thread Dave
I’ve found that each solr instance will take as many cores as it needs per request. Your 2 sec response sounds like you just started the server and then did that search. I never trust the first search as nothing has been put into memory yet. I like to give my jvms 31 gb each and let Linux cache

Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

2022-03-17 Thread Dave
I’m a big believer in the right tool for the job. Like what said before if you’re doing just a field:value query or four and no complications, sure use a standard rdbms. But if you inform the client that something like Leaves And whitm* title^3 with bf:title^3 author ^2 Is possible, the conver

Re: How to run Solr on two servers for redundancy

2022-03-13 Thread Dave
keep all action on one until it falls, and never use over 31 fb heap size Just is just a trial and error and complete success option snd no need of complications with zk -Dave > On Mar 13, 2022, at 3:48 PM, Sam Lee wrote: > > How do I run Apache Solr on two servers such that I will

Re: Storing logs in Apache Solr

2022-02-21 Thread Dave
Solrs stats functions are great when analyzing logs if they are pre processed. > On Feb 21, 2022, at 4:26 PM, Joel Bernstein wrote: > > We use Solr for logs analytics. This is a lot more power in Solr's math > expressions than in Elastic's aggregations and Solr also has new root cause > analys

Re: Is there an easy way to determine Lucene versions for segments?

2022-01-02 Thread Dave
“ I tried removing the check in SegmentInfos.java ( https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321) , compiled the code and ran a full sequence of index upgrades from 5.x -> 6.x -> 7.x ->8.x. The upgrade goe

Re: Incremental backup for Standalone Solr

2021-11-15 Thread Dave
Stand-alone solr is in my opinion better if you have your own machines. Currently solr cloud with the required zoo keeper machines just makes no sense when you can just have nginx in front of a cluster of replicated servers, > On Nov 15, 2021, at 8:18 AM, Eric Pugh > wrote: > > I generally

Re: SOLR Performance on RHEL 7

2021-10-26 Thread Dave
I have always preferred completely turning off swap on solr dedicated machines, and especially if you can’t use an SSD. > On Oct 26, 2021, at 12:59 PM, Paul Russell wrote: > > Thanks for all the helpful information. > > Currently we are averaging about 5.5k requests a minute for this collect

Re: Index for text with space

2021-10-25 Thread Dave
gt;> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>>> On Oct 23, 2021, at 4:31 AM, Dave wrote: >>> >>> Why ever would you not index less than three characters? >>> “To be or not to be” >>> Seems l

Re: Index for text with space

2021-10-23 Thread Dave
Why ever would you not index less than three characters? “To be or not to be” Seems like a significant search > On Oct 23, 2021, at 7:28 AM, son hoang wrote: > > Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text can > be separated into a token "Abbas" (and "Al" but it

Re: boosting specific number of Products

2021-10-20 Thread Dave
Maybe something like bq =rank:[1 TO 20]^10 I’m afk at the moment, but seems like it makes sense > On Oct 20, 2021, at 1:44 PM, sachin gk wrote: > > Hi All, > > If a particular boost expression is boosting 100 Products, can we boost > only the top 20 products and let other ranking criteria fil

Re: Solr High Availability | Multi-Datacenter approach

2021-10-08 Thread Dave
Yes. Put a proxy to hold the solr instances on your server, and simply point solrj to that proxy which has autofailover abilities already built in and you will instantly drop down the server list of one fails to respond. > On Oct 8, 2021, at 2:44 AM, HU Dong wrote: > > Hi, > > We're facing

Re: Better way to debug Solr Scoring

2021-10-07 Thread Dave
Which parts of the scoring are confusing to you? As in specifically. Solr is just a cover over lucene and the scoring has been documented for a long time: http://www.lucenetutorial.com/advanced-topics/scoring.html * Documents containing *all* the search terms are good * Matches on rare words a

Re: Is there an easy way to determine Lucene versions for segments?

2021-10-06 Thread Dave
ael Conrad wrote: > > too late it's in progress. > >> On 10/6/21 9:11 AM, Dave wrote: >> Hold on that idea then. An optimize will use three times your index size >> possibly. >> >>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad wrote:

Re: Is there an easy way to determine Lucene versions for segments?

2021-10-06 Thread Dave
gt; -Mike > >> On 10/6/21 8:54 AM, Dave wrote: >> Personally I always do a full reindex when going to a new version, just >> safer and you should always be able to do such at any point. However if you >> got the time to spare you can do an optimize and it will force th

Re: Solr User with "Document is missing mandatory uniqueKey"

2021-10-06 Thread Dave
Also a unique id is valuable if for example you are indexing from a database, and you use the id from the table, but of course other tables you index can have the same id value, so your indexer can append it with the table name as a simple example. I can’t think of any situation where you would

  1   2   >