Re: Final CFP Reminder: Community Over Code

2023-07-13 Thread David Hastings
I submitted a proposal but I couldnt upload a PDF that I think would help On Thu, Jul 13, 2023 at 2:39 PM Houston Putman wrote: > Hello everyone, > > Today is the last day that you can submit a presentation for > Community Over Code (formerly known as ApacheCon). > Submissions will be accepted

Re: Solr Heap Memory Settings

2023-03-09 Thread David Hastings
>Set -Xms to "I know it wants at least this much". >Set -Xmx to significantly, but not wildly, more. no, always set them to the same no matter what. I like increments of 1024M so I would start at 2048M and work up to 8gb and see how it performs. Having a test script that forks to how man

Re: Solr 4.10 - master/slave replication

2023-03-08 Thread David Hastings
Also take a look at your Xmx and Xms values. I find this to be absolutely vital when taking into consideration the machine's own memory, and the index size. you want the OS to be able to put most of the index segments into memory, and want those two values (Xmx and Xms) to be identical, Just a t

Re: Multiple cores

2022-12-28 Thread David Hastings
this is actually something I experienced using things like MLT in order to get "similar" documents, is the corpus has to match, or else it all goes out the window. so yeah if you have multiple cores/collections with the same exact type of documents you can be pretty safe, but once you start mixing

Re: Slowness in Solr Optimize

2022-12-13 Thread David Hastings
aster than the > caching in a single Solr server. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Dec 13, 2022, at 11:27 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > Ah, that m

Re: Slowness in Solr Optimize

2022-12-13 Thread David Hastings
Ah, that makes sense. If you can do sticky sessions and such with your balancers, plus I never had to deal with the throughput of something like Netflix, so for mine and most use cases, I still feel one very hot server is better than N warm ones. "but AWS load balancers aren’t very smart." - agre

Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread David Hastings
Of course. Also, remember there are things to consider like if you want to store/retrieve it as an absolute string including capitalization for a facet/Drop down selection or only as a search field. lots of nuances. -Dave

Re: Duplicate docs with same unique id on update

2022-12-08 Thread David Hastings
Interesting, this is kind of bizarre behavior. is: defaulted in the schema for 8.x? On Thu, Dec 8, 2022 at 9:31 AM Eduardo Gomez wrote: > > At first it wasn't clear to me what the problem you're having actually > > is. Then I glanced back at the message subject ... it is the only place > > you

Re: Solr Contributor Bootcamp announced to coincide with ApacheCon USA

2022-10-18 Thread David Hastings
I am definitely down for either option. I've been wanting to contribute for a while now On Tue, Oct 18, 2022 at 12:18 PM Nazerke S wrote: > both options work for me. > > > On Tue, Oct 18, 2022 at 1:53 AM preeti kumari > wrote: > > > Option 1 works for me too. > > > > Thanks > > > > On Tue, 18

Re: Identifying SOLR [query] performance issue (or "how to scale up")

2022-09-21 Thread David Hastings
| I think because the instance has 64Gbytes of RAM and I've divided this up into 32Gbytes to the JVM reduce it to 30 or 31 for the JVM, 32 ->64 is not a good idea. it triggers a pointer flip to 64bit https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ On Wed, S

Re: Search without Accent

2022-09-07 Thread David Hastings
*what Markus said (just beat ya to it by a minute :)) On Wed, Sep 7, 2022 at 9:22 AM Markus Jelsma wrote: > Hi Karsten, > > You forgot to add ASCIIFoldingFilter to IndexAnalyzer, please try again > with: > > positionIncrementGap="100"> > > > > preserveOrigi

Re: Search without Accent

2022-09-07 Thread David Hastings
Don't mean to interrupt, but out of curiosity why is ASCIIFoldingFilterFactory only in the query analyzer, not the indexer? wouldn't it need to be in both to get the desired result? On Wed, Sep 7, 2022 at 8:59 AM Carsten Klement wrote: > > >- > Hi Markus, thank you, yes i think

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread David Hastings
gt; > to quote it. We had a quick try quoting the hyphenated term in the query > > as > > you suggested and it looks like it works (i.e. returns matches). Since > > the > > standard tokenizer splits on hyphens, I'm wondering the unquoted query > > somehow get

Re: Terms with hyphens and fuzzy search

2022-08-23 Thread David Hastings
I’m not certain of course of your tokenizer but shouldn’t it be “terms-with-hyphens”~1 ? Just a syntax thing that may not have translated over email but curious On Tue, Aug 23, 2022 at 10:12 AM Julian Hugo wrote: > Hello, > > I am getting peculiar results when querying for a term containing hyp

Re: Retain Data Import Handler In Solr9.0

2022-07-23 Thread David Hastings
Guess it depends on how many scripts you want to maintain/things to do by hand, but in any case, it is the best route, multiple indexing services/processes, and skip the DIH all together, it wasnt that great of an idea in the first place. it was super clever, and I appreciate the work that went in

Re: Howto restore backup from solr cloud to solr standalone after upgrading to 9.0

2022-07-14 Thread David Hastings
"But, luckily for your case: it fell off my radar and was never actually removed." if this isn't the definition of my favorite part of open source software, i don't know what is :) On Thu, Jul 14, 2022 at 8:41 AM Jason Gerlowski wrote: > Hi Michael, > > As you mentioned, the community original

Re: Solr eats up all the memory

2022-07-04 Thread David Hastings
in my experience, yes, solr should have its own hardware, and be allowed to eat all of it. never give it more than 31gb of jvm heap, and give it as much memory as possible. 64GB should work fine but I can just go on amazon and buy another 128GB for less than $500, the more the better, less than $

Re: Update/Reindex

2022-06-21 Thread David Hastings
you have never been able to "update" unless all fields are stored. if you have indexed only fields, they would always be lost, it was always a read/destroy/re-index, but only for stored fields. if you got a general text non-stored field it is gone. On Tue, Jun 21, 2022 at 10:30 AM Mike wrote:

Re: Auto recovery of solr

2022-06-21 Thread David Hastings
OOM errors do become annoying, so implemented https://www.tecmint.com/clear-ram-memory-cache-buffer-and-swap-space-on-linux/ to the restart sh as well, but as Shawn has said, the ones deployed have never gone down, granted each are given 31gb of heap space on a 200gb+ Ram server. sometimes its ju

Re: Solr indexing performance tips

2022-06-08 Thread David Hastings
> * Do NOT commit after each batch of 1000 docs. Instead, commit as seldom as your requirements allows, e.g. try commitWithin=6 to commit every minute this is the big one. commit after the entire process is done or on a timer, if you don't need NRT searching, rarely does anyone ever need that

Re: Re: SolrJ compatibility

2022-05-31 Thread David Hastings
If the solr indexing packages among all the different languages suddenly need to upgrade just because of solr cloud and can’t use http requests for indexing, that would be a bad bad thing. The cloud internals should handle anything indexing wise beyond “here is a document and it’s metadata” much li

Re: Solr GC Tuning causes issues and doesn't start Solr url

2022-05-04 Thread David Hastings
on a side note, once you have the JRE figured out, as a rule of thumb I make my Xms and Xmx the exact same values, and if this is your own metal, buy more memory and up those values up to 31Gb each On Wed, May 4, 2022 at 10:15 AM YOGENDRA SONI wrote: > If java is installed and java version is 9

Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread David Hastings
the 30+ million records I retrieved were always from a single standalone solr node, and yes you can do that frequently and it doesnt have an impact on the rest of the searches happening assuming you have enough memory to deal with it. there is nothing wrong with requesting every one of your docume

Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-27 Thread David Hastings
Very often I have used solr to return over 30 million documents with no ramifications via a wget call and a LONG timeout. granted it took a while and the resulting file was in the multiple GB's of size, but there isnt any issues with it I ever encountered. I also used about a 31gb JVM head and ha

Re: Snowflake vs Solr

2022-04-22 Thread David Hastings
; > > > > > > > > > On Thu, 21 Apr 2022, 20:34 Balanathagiri Ayyasamypalanivel, < > > > > bala.cit...@gmail.com> wrote: > > > > > > > > > Thanks David, for your quick response. From the overall article in > > the > > > > > sys

Re: Snowflake vs Solr

2022-04-21 Thread David Hastings
lt; > > bala.cit...@gmail.com> wrote: > > > > > Thanks David, for your quick response. From the overall article in the > > > system, it seems like snowflake little faster than the normal Database > > > system, so instead of sourcing the data from snowflake t

Re: Snowflake vs Solr

2022-04-21 Thread David Hastings
I dont see how they are comparable. One is a DB, the other is a search engine. There is no overlap aside from Solr indexing data fro Snowflake to search it On Thu, Apr 21, 2022 at 10:21 AM Balanathagiri Ayyasamypalanivel < bala.cit...@gmail.com> wrote: > Hi, > > Any one recently switched from S

Re: Looking for online tutorials setting up SolrCloud replication with Solr 8.11.1

2022-04-13 Thread David Hastings
I think this would be very beneficial, especially with a step by step on getting all three zookeeps up and running for a quorum, honestly this has been one of my major hold backs form implementing cloud, when the simplicity of a standalone server and a couple hundred $ of ssd's and memory make up f

Re: Regarding indexing data in different cores or same core with different entities.

2022-04-10 Thread David Hastings
The different field types for the same named field I didn’t take into account. That would throw a wrench into it if one table wanted facets on a field but the other just wanted text searching on the same field name for example. Guess without context the question becomes more difficult to answer,

Re: Solr as a dedicated data store?

2022-04-08 Thread David Hastings
As long as your documents are simple in structure. A key value or an array for any given field, you’re good to go. Anything multi level, you’re out of luck. Not sure how relevant this link is still but: https://stackoverflow.com/questions/22192904/is-solr-support-complex-types-like-structure-for-mu

Re: Vulnerability on solr port

2022-04-07 Thread David Hastings
“IP address of the original server” Is exactly the problem. A solr server doesn’t/shouldn’t have an up address that exists outside of the internal network. So even if it didn’t get an IP it would have no vulnerabilities since, it’s not a real ip. The only people or machines that can touch ot are

Re: Vulnerability on solr port

2022-04-07 Thread David Hastings
Yes, this looks like an IIS problem. IIS is on version 10, "Current Description . IIS 4.0 allows remote.." there is no reason IIS 4.0 should be running, ever On Thu, Apr 7, 2022 at 3:00 PM Jan Høydahl wrote: > Hi, > > Solr is not a web server that is accessible to someone on the outside o

Re: Incremental backup for Standalone Solr

2021-11-15 Thread David Hastings
thats an opinion of course, RAM is cheap, 1 TB of memory and 3 of an SSD is less than a honda civic, and solr cloud has some weaknesses, I wont push one way or the other, I have used both, I just like old stuff and nginx is better than zookeeper On Mon, Nov 15, 2021 at 10:13 AM Shawn Heisey wrote

Re: boosting specific number of Products

2021-10-26 Thread David Hastings
H, is this a newer thing from solr 8.X? On Tue, Oct 26, 2021 at 4:35 PM Joel Bernstein wrote: > This may be what you're looking for: > > https://solr.apache.org/guide/8_8/query-re-ranking.html > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Oct 21, 2021 at 2:39 PM sachin g

Re: Is there an easy way to determine Lucene versions for segments?

2021-10-06 Thread David Hastings
Ah, my mistake then. For some reason, I thought the optimize would re-index the documents from the old segments into new ones, but I suppose that would only be possible for stored=true fields. The more you know! Again, I always just did a full re-index from scratch when upgrading. On Wed, Oct 6

Re: qf with multiple fields in _query_ with edismax

2021-09-10 Thread David Hastings
Do you mind if I ask why not use a post to solr? On Fri, Sep 10, 2021 at 10:18 AM Andy Coulson wrote: > Thanks Erik, > > That did the trick! My real use case will have additional predicates - I > merely trimmed it down to reproduce and illustrate the problem. I real > query will probably be some

Re: REINDEXCOLLECTION unknown field

2021-03-22 Thread David Hastings
>Surely this field should simply just be ignored? why would solr ignore this field if you're trying to index to it? can't you change your indexer to remove these fields as well? solr will try to do what its told, and if its told to do something bad it will simply fail, you dont want it to ignore