Re: Annoying problem when running SolrCloud fully containerized.

2025-07-21 Thread Gus Heck
At present embedded zookeeper is "supported" only for initial testing, basically to ease the running of tutorials. It's not designed to form clusters that provide redundancy, nor is there much thought put into facilitating management of its data store or securing it from unwanted access. There are

Re: Advice on ways forward with or without Data Import Handler

2025-06-03 Thread Gus Heck
Perhaps try out JesterJ? It has a database connector, and I encourage you to try it out. https://github.com/nsoft/jesterj There are discussion forums, issue reporting and a discord channel if you have questions or feedback. Full disclosure: I wrote most of JesterJ, though the JDBC connector was

Re: Using ExtractRequest handler to index documents using type_leve=parent

2025-05-22 Thread Gus Heck
Adding the _root_ field without reindexing isn't supported. Having documents that are not nested (after re-index) can be made to work but it requires a deep understanding of how the querying of nested documents works at a low level in order to form correct queries for nested documents. In particula

Re: GOAWAY signal

2025-05-19 Thread Gus Heck
According to rfc 9113 GOAWAY just means that the server wants to close the http connection. Solr doesn't write its own HTTP handling code, and I expect that the libraries we use (the JDK based on the class you say you are using) are follow

Re: unescape solr index content

2025-05-12 Thread Gus Heck
Obviously, there's lots we don't know about your system and your plans, but the narrow view your email gives us looks like you may misunderstand the nature of Solr. Solr is a search index, and its primary function is to help you FIND your data based on the text, or other data (i.e. spatial data) it

Re: Automatic upgrade of Solr indexes over multiple versions

2025-04-01 Thread Gus Heck
That's interesting, but brings up the question of what happens if a node (or the whole cluster) is rebooted in the middle of the process? On Mon, Mar 31, 2025 at 10:02 PM Rahul Goswami wrote: > Some good points brought up in the discussion. The implementation we have > reindexes a shard reading

Re: Automatic upgrade of Solr indexes over multiple versions

2025-03-30 Thread Gus Heck
Some thoughts: A lot depends on the use case for this sort of thing. In the case of relatively small installs that can afford to run >2x disk and have significant query latency headroom this might be useful. However, if a company is running a large cluster where maintaining excess capacity costs t

Re: Issue during creation of Dimensional Routed Alias

2025-02-21 Thread Gus Heck
Hi Marcel, Thanks for bringing this up. This looks like it should definitely have a Jira issue. I'll look at this in the near future, if you want to create the Jira that's helpful, if not I'll create one when I look into it. -Gus On Fri, Feb 21, 2025 at 11:22 AM Marcel Gawron wrote: > Hi, > th

Re: Using the NOT operator with the AND operator

2025-01-31 Thread Gus Heck
@hoss, did that replace the previous article by Erick? I can't find the old one anymore. On Thu, Jan 30, 2025 at 5:48 PM Chris Hostetter wrote: > > Obligatory reading about "boolean" queries in lucene & solr -- still very > relevant ~13 years later... > > https://lucidworks.com/post/solr-boolean

Re: Solr shutdown everyday at 11:34:06Z

2025-01-27 Thread Gus Heck
It is usually recommended to run all but the smallest non mission critical solr installs on their own hardware or at least own VM. Besides issues like this, the lucene library is optimized to use system level disk caching and the more other stuff competing for disk, the slower lucene (and therefore

Re: Relevance sorting in federated search

2025-01-17 Thread Gus Heck
I belive you will find this section of the Ref guide helpful https://solr.apache.org/guide/solr/9_7/deployment-guide/solrcloud-distributed-requests.html On Sat, Jan 18, 2025, 12:56 AM Rahul Goswami wrote: > Hello, > Let's say I have a 4 shard Solr collection. When I query the collection, > what

Re: I may fork nutch. Is it a good plan?

2025-01-08 Thread Gus Heck
Perhaps you're looking for https://grep.app/ ? It does regex search vs github and was recently acquired by Vercel. It was written by a friend of mine. On Wed, Jan 8, 2025 at 9:44 AM anon anon wrote: > Markus: I probably misunderstood your remark. > > Could it be possible to use a git clone proto

Re: CPU load highly increased after update

2024-11-29 Thread Gus Heck
Upgrade from which version? On Fri, Nov 29, 2024, 9:31 AM Patrik Peng wrote: > Hi all > > We're observing a similar load increase after the update. > > Regards, > Patrik > >

Re: RAMDirectoryFactory with Solrj 9.7.0

2024-11-26 Thread Gus Heck
IIRC the ByteBuffersDirectoryFactory is what new code should be using: https://issues.apache.org/jira/browse/SOLR-12861 On Tue, Nov 26, 2024 at 4:13 PM Péter Király wrote: > Dear all, > > I am developing an application that intensively use Apache Solr, that > among others makes library catalogue

Re: Highlighting for range queries

2024-11-19 Thread Gus Heck
Highlighting is the demarcation of a range of text? How would that apply to a field containing a number? On Mon, Nov 18, 2024, 10:41 AM Clemens, Vera wrote: > Hi, > > is it possible to get search match highlighting for range queries? For > example, if I query for integerField:[0 TO 10] and match

Re: timeAllowed in Solr 9

2024-11-11 Thread Gus Heck
> description:intern descrip > > tion:internship))^2.0 | (Synonym(title:apprentic title:apprenticeship > > title:intern title:internship))^5.0)~0.01 (keywords:intern | > > Synonym(company:apprentic company:apprenticeship company:intern > > company:internship) | (Synonym(descrip

Re: timeAllowed in Solr 9

2024-11-08 Thread Gus Heck
The mailing list usually strips out attachments. You'll need to paste it into the body of the email. On Fri, Nov 8, 2024 at 7:16 AM Dominic Humphries wrote: > Fair enough! See attached, if that doesn't work I'll send it inline... > > On Thu, 7 Nov 2024 at 18:40, G

Re: Is it possible to interrupt a long-running highlight processing using timeAllowed parameter?

2024-11-07 Thread Gus Heck
Have you tried this in Solr 9.6 or later? SOLR-17172 made a change that may help you: https://github.com/apache/solr/pull/2323/files#diff-3d7ebc10bdaedfb20fa269bc0c0a417fd326717a0b06c0b00ee19078d3894092R113 Andrzej Bialecki, Chris Hostetter and I have been working to improve this area over the la

Re: Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
or even 2000ms: > > $ curl -s > > 'localhost:8983/solr/landreg/select?fl=uuid&q=*:*&sort=transfer_date+desc&start=94&debug=timing&timeAllowed=200' > \ > > | jq .debug.timing.process.query > { > "time": 3954.0 > } > $ curl

Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
d, afaik it's single-sharded. > > Same query with facet fields removed takes just as long to run. Adding the > debug to the request generates a rather large amount of output, I believe > due to synonyms - I can send them if it's useful, but it's rather a lot? > > On Thu,

Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
quot;:{ > "numDocs":7349353, > "maxDoc":7834951, > "deletedDocs":485598, > "segmentCount":31, > "segmentsFileSizeInBytes":2727, > "sizeInBytes":22066572844, > "siz

Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
gt; > > > On Wed, 6 Nov 2024 at 16:40, Dominic Humphries > wrote: > > > >> Unfortunately I don't know Java anywhere near well enough to know my way > >> around a profiler or jstack. I've confirmed JMX is enabled and I can > telnet > >> to the

Re: timeAllowed in Solr 9

2024-11-06 Thread Gus Heck
run them - I'm not sure > how to usefully interrogate solr for where its time is being spent, sorry > > Thanks > > On Wed, 6 Nov 2024 at 14:25, Gus Heck wrote: > > > There are unit tests that seem to suggest that timeAllowed still works, > can > > you prov

Re: timeAllowed in Solr 9

2024-11-06 Thread Gus Heck
There are unit tests that seem to suggest that timeAllowed still works, can you provide some more information about your use case? Particularly important is any information about where (what code) your queries are spending a lot of time in if you have it. On Wed, Nov 6, 2024 at 6:18 AM Dominic Hum

Re: Configuring files Solr for use with Data import handler

2024-10-29 Thread Gus Heck
The data import handler is no-longer part of solr, so you may wish to also ask questions on their discussion boards: https://github.com/SearchScale/dataimporthandler/discussions Data Import Handler is a reasonable tool for indexing small, uncomplicated databases, but does not scale very well as sy

Re: Can I combine these 2 query filters with a logical OR?

2024-09-29 Thread Gus Heck
I don't know if it will help in this case, but I have sometimes found that the parser can have difficulty properly finding the end of a localparam with a complex following argument (your polygon intersects for example, a graph query when I last recall encountering this problem years ago) I think th

Re: FW: RE: SOLR-13510 patch installation

2024-09-27 Thread Gus Heck
Looking at the issue you linked, it appears to say that the issue is fixed in 8.1.2, and you claim to be running 8.1.1. It's very likely to be safer and easier to upgrade a single bugfix version than custom apply a patch. Why not upgrade? On Fri, Sep 27, 2024 at 12:07 PM Raju Vaddeh wrote: > Hel

Re: FW: RE: SOLR-13510 patch installation

2024-09-27 Thread Gus Heck
1... https://github.com/apache/lucene-solr/commit/7cc04f5adcbf49786c10c80d885138d53f5e1321 https://github.com/apache/lucene-solr/commits/branch_8_1/ On Fri, Sep 27, 2024 at 1:13 PM Gus Heck wrote: > Looking at the issue you linked, it appears to say that the issue is fixed > in 8.1.2, and

Re: Solr 9.7.0 returns fields in different order with fl parameter

2024-09-14 Thread Gus Heck
Unless something changed while I wasn't paying attention, order of fields in a response is not guaranteed, and some searching in Jira seems to confirm that ordering of fields in the response is a WONTFIX since 1.3: https://issues.apache.org/jira/browse/SOLR-1190 One would not normally expect order

Re: The upside of running Solr on hadoop hdfs

2024-08-31 Thread Gus Heck
I'm curious about this too. There's a bunch of difficult to maintain code in our codebase relating to HDFS and a lot of the HDFS tests are super flakey. I've had the impression it was mostly added because there was a point at which "HDFS all the things" was a fad. I haven't personally ever seen it

Re: Will solr support in AWS/Azure Cloud platform

2024-08-27 Thread Gus Heck
*Short answer:* Yes *Slightly longer answer: *Azure supports Linux so that's the easy button. If you have reasons that you must do it, Windows is also supported. *Typical annoyingly long answer from me:* Cloud provider doesn't matter much, but if you're at all able to, deploy on Linux. It's been

Re: SolrCloud Delete by Query - Issues?

2024-08-27 Thread Gus Heck
Also if you are using Block Join indexes, delete by query is a very bad idea unless the query is very carefully crafted to either ensure all children for a parent are deleted, or to avoid deleting any document that has children (it can be done, I had to do it a couple years ago, but it's really eas

Re: oom_solr.sh

2024-08-21 Thread Gus Heck
Systemd has an option to restart a service automatically. This is probably the most common solution in cases where solr is run in linux. In the Kube world, the pod should fail a health check if solr is killed and a new pod would then be created and the same persistent volume (with the index) is at

[ANNOUNCE] Apache Solr 9.6.0 released

2024-04-28 Thread Gus Heck
Smiley, Michael Gibney, Paul McArthur, Jan Høydahl, James Dyer, Eric Pugh, Andrey Bozhko, Andrzej Bialecki, Rahul Goswami, Bruno Roustant, Jason Gerlowski, Sanjay Dutt, Vincent Primault, Christine Poerschke, Gus Heck, Shawn Heisey, Vincenzo D'Amore, Yohann Callea, Julien Pilourdault, Wei Wan

Re: [EXTERNAL] Unique key not being generated by UUIDUpdateProcessorFactory

2024-04-13 Thread Gus Heck
It's also often not a good idea to use a generated UUID as a document ID because if the corpus is ever re-indexed (to accept new schema changes, or for upgrade to a new major version for example) all the document id's will change and any users or software trying to act on the prior id's will have d

Re: Keep empty fields in 9.5

2024-04-04 Thread Gus Heck
Storing a space, whitespace or empty string for a field is generally a bad practice. Doing so makes it impossible to query for documents that don't contain the field using the normal syntax (i.e. q=*:* -myField:*) On Thu, Apr 4, 2024 at 9:09 AM Carsten Klement wrote: > Hi, > > we are currently u

Re: Security Vulnerabilities in AngularJS used for Solr

2024-04-03 Thread Gus Heck
Looking at these two CVE's they both appear to represent the possibility of browser level DOS and not any compromise in access to the service. So at most a person whom you have given access to the admin UI could inhibit themselves from using that UI, or perhaps send someone else who has access a li

Re: Symlink indexing

2024-03-19 Thread Gus Heck
Though many of us will likely be happy to help, we'll need you to back up and give us some detail. There are multiple ways to index data into solr. Most of them are not part of solr, but separate tools, and custom tools and code are very common solutions. Can you describe the detailed steps you too

Re: Document routing.

2024-03-06 Thread Gus Heck
"Completely Fresh" is a non-technical term. You could mean several things, some of which are "fresher" than others: - Created a new collection with zero docs, switch alias when complete (preferred) - Sent all docs to the existing collection (not preferred for full re-index, worst for y

Re: [EXTERNAL] Re: Is this list alive? I need help

2024-02-28 Thread Gus Heck
single server. As follows: > > > > > > > > > > > > > > > > > > The three nodes are r5.xlarge and we’re not sure if those are large > > enough. The documents are not huge, from 1K to 25K each. > > > > > > > >

Re: [EXTERNAL] Re: Is this list alive? I need help

2024-02-28 Thread Gus Heck
const rsp = await axios(config); > > if(rsp.data && rsp.data.response) { > > let docs = rsp.data.response.docs; > > if(docs.length == 0) break; > > config.params.start += limit; > >

Re: 500 Exception at regular intervals after upgrading to 9.5.0

2024-02-28 Thread Gus Heck
*Here's the full exception:* * org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException* You missed the exception message which would be very useful (line above most likely) On Tue, Feb 27, 2024 at 6:29 AM Henrik Brautaset Aronsen < henrik.aron...@gmail.com> wrote:

Re: Backtick character in field data breaks streaming query

2024-02-27 Thread Gus Heck
On Tue, Feb 27, 2024 at 12:13 PM Rahul Goswami wrote: > I can submit a fix for > this. Should I open a JIRA? > Certainly! -- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)

Re: Is this list alive? I need help

2024-02-25 Thread Gus Heck
Hi Jim, Welcome to the Solr user list, not sure why your are asking about list liveliness? I don't see prior messages from you? https://lists.apache.org/list?users@solr.apache.org:lte=1M:jim Probably the most important thing you haven't told us is the current size of your indexes. You said 20k/da

Re: 3x+ performance reduction for the prefixed wildcard fl (like fl=abc_*) in 9.5.0 compared to 9.4.1

2024-02-24 Thread Gus Heck
Likely Introduced by SOLR-17022: Support for glob patterns for fields in Export handler, Stream handler and with SelectStream streaming expression (#1996) On Sat, Feb 24, 2024 at 10:51 AM Gus Heck wrote: > Well... awesome that you have identified and documented this. Not awesome > t

Re: 3x+ performance reduction for the prefixed wildcard fl (like fl=abc_*) in 9.5.0 compared to 9.4.1

2024-02-24 Thread Gus Heck
Well... awesome that you have identified and documented this. Not awesome that it happened of course. Definitely Jira worthy. On Fri, Feb 23, 2024 at 9:55 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Awesome. Please feel free to open a JIRA issue about it. > > On Fri, 23 Feb, 202

Re: Need suggestions on performance improvement in Solr based application

2024-02-21 Thread Gus Heck
There are two features meant to support deep/large results: Export handler (Requires docvalues enabled) Cursormark (Requires serial requests passing a token from the prior request)

Re: Indexing Solr Ref Guide

2024-02-18 Thread Gus Heck
ed to what I am doing > right now (so not digging deeper). > > > > On Fri, 16 Feb 2024 at 21:12, Gus Heck wrote: > > > Hi folks, > > > > *TLDR;* I put up a github repo (check it out): > > https://github.com/nsoft/index-solr-ref-guide > > > > *Th

Indexing Solr Ref Guide

2024-02-16 Thread Gus Heck
Hi folks, *TLDR;* I put up a github repo (check it out): https://github.com/nsoft/index-solr-ref-guide *The Details:* Last Year I announced JesterJ's 1.0 release and gave a lightning talk about it at Haystack. There were lots of folks who seemed to think it sounded cool, but I got zero useful fee

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-16 Thread Gus Heck
remental changes from the source db > tables, such as insert, update, and delete. > > On Fri, Dec 15, 2023 at 12:58 PM Gus Heck wrote: > > > Have you considered trying an existing document ingestion framework? I > > wrote this one: https://github.com/nsoft/jesterj It already h

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Gus Heck
Have you considered trying an existing document ingestion framework? I wrote this one: https://github.com/nsoft/jesterj It already has a database connector. If you do check it out and find difficulty please let me know by leaving bug reports (if bug) or feedback (if confusion) in the discussions se

Re: how to fix full processing disrupted in solr 8.11

2023-12-06 Thread Gus Heck
Looks like your client crashed out while trying to receive the response perhaps? Caused by: java.io.IOException: An established connection was aborted by the software in your host machine at sun.nio.ch.SocketDispatcher.writev0(Native Method) ~[?:?] at sun.nio.ch.SocketDispatcher.writev(Unk

Re: what is SOLR syntax to remove duplicated documents

2023-10-22 Thread Gus Heck
Echoing what Thomas says, this problem indicates your indexing system probably has a significant design flaw. For most systems, you should have a notion of document identity that is external to Solr, and that should be used as (or to deterministically generate) the id in Solr. If you don't do this

Re: Cancelling an Async operation - Shard split

2023-09-28 Thread Gus Heck
Unless you are very experienced and comfortable with solr, do not edit zookeeper nodes directly. Things you should touch generally have support in bin/solr or other provided tools. If you edit the wrong things you can cause all manner of chaos, and even completely ruin the entire cluster, requiring

Re: Backup from old server and Restore to new server

2023-09-28 Thread Gus Heck
Scanned this thread, apologies if I missed something, but here's a few thoughts: To get better advice make it clear if you are running Solr in Cloud mode (a.k.a. self managed) or Legacy (a.k.a user managed). Some ways to know which quickly: 1. Is there an associated Zookeeper cluster? If yes,

Re: Deleting document on wrong shard?

2023-05-24 Thread Gus Heck
Understood, of course I've seen your name on the list for a long time. Partly my response is for the benefit of readers too, sorry if that bothered you. You of course may have good reasons, and carefully refined a design for your situation, that might not be best emulated everywhere. Living in Kube

Re: Deleting document on wrong shard?

2023-05-24 Thread Gus Heck
Often it's a better idea to index into a fresh collection when making changes that imply a full re-index. If you use an alias, the swap out of the old collection is atomic when you update the alias, requiring no front end changes at all (and swap back is easy if things aren't what you expected). Of

Re: Help needed testing new systemd script (SOLR-14410)

2023-05-23 Thread Gus Heck
OH this is good news. I can try it out. It would be nice not to always have to write my own On Tue, May 23, 2023 at 5:13 AM Jan Høydahl wrote: > Hi all, > > We have an excellent contribution in > https://issues.apache.org/jira/browse/SOLR-14410 and > https://github.com/apache/solr/pull/428 to sw

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Gus Heck
That looks like a bug. Seems to be splitting if the character class before and after differ, but not if they are the same. ST XYZ123 tif SF XYZ123 tif LCF xyz123 tif and ST XYZ 123tif SF XYZ 123tif LCF xyz 123tif But... ST XYZ123.123tif SF XYZ123.123tif LCF xyz123.123tif On Tue, May 2,

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Gus Heck
I concur that the docs clearly state your expected behavior should be true: Standard Tokenizer This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions: - Periods (dots) that are n

Re: Running solr service with nologin solr user

2023-04-28 Thread Gus Heck
Error message? On Fri, Apr 28, 2023 at 9:53 AM Kirk Baker < kirk.ba...@lexicalintelligence.com> wrote: > We are running Solr 9.1 on RedHat Linux. My organization's security > requirements stipulate that all system accounts have a non-interactive > shell. When I set the 'solr' user to nologin, the

Re: PROD Solr errors

2023-02-16 Thread Gus Heck
If you are 100% sure no machines or solr software were restarted, you had a networking issue that broke communications, perhaps a switch or router restart, someone re-plugging cables, or a router that got overwhelmed, (or is failing)? Unrelated side note: 48GB heap is not usually a great setting.

Re: SOLR security scan question

2023-02-15 Thread Gus Heck
Hi Razvan, Have you looked at https://solr.apache.org/security.html yet? Some of the CVE's in your list are already listed there. If you could eliminate the CVE's from your list that are already dealt with on that page then you might get more attention. As it stands, you seem to be asking us to do

Re: When to index data into Solr?

2023-01-29 Thread Gus Heck
Definately all up front. The entire premise of search is that we do as much work at index time as possible so that queries are fast. More importantly, the whole point of the search is to discover what documents the user might want. If you don't index everything from the start you would need a proce

Re: Core reload timeout on Solr 9

2023-01-19 Thread Gus Heck
Just read through this, and don't yet have any concrete ideas better than what's been given, but I'm interested to clarify one thing you said: We are having 6 shards spread across 96 replicas. Each replica is hosted on > a dedicated EC2 instance, no more than one replica present on the same > mach

Re: Importing Data from MySql

2023-01-13 Thread Gus Heck
Not sure I'd say it's trivial. But there are lots of folks who've done it successfully. As noted, batching is important, depending on the nature of the data fault tolerance can be important too. Daily data loads are a bit different than continuous feeds of data. Also depends on to what extent one

Re: Compiling and running solr locally on mac

2022-12-26 Thread Gus Heck
Also, if you are compiling and running locally while trying to develop customized solr (or contributions) the script at dev-tools/scripts/cloud.sh may be useful. It is however designed to work with a locally running zookeeper for enhanced realism. The beginning of that script contains a long commen

Re: spatial search by zipcode

2022-12-14 Thread Gus Heck
Hi Matthew, It's worth keeping in mind that if you are *starting* with user input that has a zip code ("where zipcode is provided as a parameter?"), then it's faster at query time if you index the documents with a zip code and just match zip codes. If your documents have GPS points but not zipcod

Re: CVE-2022-40153 com.fasterxml.woodstox_woodstox-core

2022-12-03 Thread Gus Heck
Hi Billy, Thanks for bringing this up. The CVE you link is rejected ( https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-40153). However reading through the report here: https://github.com/x-stream/xstream/issues/304 it seems that this was part of a series of low quality auto generated CVE re

Re: How to write custom Solr plugin that can be called core container level instead of core level

2022-11-14 Thread Gus Heck
In 9.x it should be possible to write a separate servlet that can answer custom non-search queries. Then all you need to edit is web.xml. You can now get hold of a core container via org.apache.solr.servlet.CoreContainerProvider#getCoreContainer. Looking at the code, it seems you still need to liv

Re: [External] Re: Upgrade Jackson / SOLR-16443

2022-11-01 Thread Gus Heck
Hi Harry, Relevant, announced CVE's are listed here https://solr.apache.org/security.html and that page links a wiki page where false positives are usually listed. -Gus On Tue, Nov 1, 2022 at 1:31 PM Silverman, Harry wrote: > Thanks for your reply, and I understand. > > It is a separate depart

Re: Cannot get nested objects to index

2022-10-25 Thread Gus Heck
Your images are not showing, and clicking through doesn't get me anything other than 404. The list settings strip out images, and images are not searchable when the message gets archived anyway. Images are usually harder to work with (they often don't look good if reading mail from a phone for exam

Re: Advice in order to optimise resource usage of a huge server

2022-10-07 Thread Gus Heck
The ideal jvm size will be influenced by the latency sensitivity of the application. Large VM's are mostly good if you need to hold large data objects in memory, otherwise they fill up with large numbers of small objects and that leads to long GC pauses (GC time relates to the number, not the size

Re: Advice in order to optimise resource usage of a huge server

2022-10-06 Thread Gus Heck
It depends... on your data, on your usage, etc. The best answers are obtained by testing various configurations, if possible by replaying captured query load from production. There is (for all java programs) an advantage to staying under 32 GB RAM, but without an idea of the number of machines you

Re: Fastest way to index data to solr

2022-09-29 Thread Gus Heck
70 million can be a lot or a little. Doc count is not even half the story. How much storage space do these documents occupy in the database? Is the text tweet sized, or multi-megabyte sized clobs, or links files on a file store that need to be fetched and parsed (or OCR'd or converted from audio/vi

Re: Fastest way to index data to solr

2022-09-29 Thread Gus Heck
> > * Do NOT commit during the bulk load, wait until the end > Unless something changed this is slightly risky. It can lead to very large transaction logs and very long playback of the tx log on startup. If Solr goes down during indexing to something like an OOM, it could take a very long time for

Re: getting started?

2022-08-23 Thread Gus Heck
If you're moving towards mocking up a production system I'd move away from schemaless mode, as it enables both explosion of the number of fields if you get bad or unexpected data, and is prone to difficult to fix errors where it misidentifies numbers/strings ... particularly if a string field happe

Re: Solr dynamic reconfiguration of zookeeper ensemble

2022-08-22 Thread Gus Heck
Hi HariBabu, Typically a minor version upgrade is safe, but only someone truly familiar with your system can give you a solid answer to that question. You and/or your team may find the list of changes helpful in making that assessment: https://solr.apache.org/docs/8_11_2/changes/Changes.html -Gu

Re: sorl version 6.3 - log4j question

2022-08-11 Thread Gus Heck
Hi Ardian, You will want to review the various CVE's related to log4j1.2.17 to evaluate your risk level. The log4j2 vulnerabilities (i.e. log4shell) are not relevant to 6.3. There are several 1.2 vulnerabilities, but most of them are only activated by the use of some less common logging configurat

Re: solr backup location 8.11.1

2022-08-05 Thread Gus Heck
If it doesn't apply the defaults that's the bug right there I think. On Fri, Aug 5, 2022 at 2:10 PM Shawn Heisey wrote: > On 8/5/22 11:56, Thomas Woodard wrote: > > Yup, I absolutely did typo when I tried to do it as a default. I'll > update > > my issue to correct that. > > It will be interesti

Re: solr backup location 8.11.1

2022-08-05 Thread Gus Heck
Just looked at some other handler configurations, I think you may suffer from a typo... should /var/i8s/backup/solr/${i8s.environment}/${ solr.core.name} have been /var/i8s/backup/solr/${i8s.environment}/${ solr.core.name} (note the s) On Fri, Aug 5, 2022 at 1:05 PM Thom

Re: ExternalFileField2, massively scalable external file fields

2022-07-28 Thread Gus Heck
> > Maybe in the future we end up having a core part of Solr some sort of > offline processing capability so folks don’t have to deploy “yet another > system” ;-) > > This is something I've felt as a major gap for solr in general, however I don't think it makes sense to bake it directly in as part

Re: Re: Solr faceting

2022-07-18 Thread Gus Heck
Hi Poorna, I think it would be helpful if you backed up a bit and told us exactly what version of solr you are using, exactly what api you are calling (examples please). Finally please detail *how* you are looking at the field cache. You also haven't told us much about where you are in the process

Re: Unsubscribe me

2022-07-12 Thread Gus Heck
Have you checked that the mails are actually going to the address you are unsubscribing from? If you are getting mails forwarded from a prior address you would need to send the unsubscribe from that address. On Tue, Jul 12, 2022 at 5:57 PM Brad Burke wrote: > Ha. Same here. I have tried ever w

Re: Error indexing files(html, pdf) using SOLR Cell Tika

2022-07-12 Thread Gus Heck
Based on the error message I think you want to use "literal._uniqueid" not " literal.id". Your schema which possibly no longer has a field named "id" is requiring a field name "_uniqueid". See https://solr.apache.org/guide/8_11/uploading-data-with-solr-cell-using-apache-tika.html#using-literals-to

Re: Transfer to a new server

2022-07-11 Thread Gus Heck
and > another 500 of stored full text along side of it, plus another couple > hundred gigs of a supporting index, and all from different sources. > > But yes it should be able to be done in a week or two. I never upgrade > the solr servers without a full reindex. > > > On Jul 11,

Re: Transfer to a new server

2022-07-11 Thread Gus Heck
After you complete this move you should address this. It's not good to run a server that can't be re-indexed within a reasonable amount of time. Such a situation means you will never be able to take advantage of new index features, never be able to change the way existing fields are analyzed, and a

Re: Solr eats up all the memory

2022-07-05 Thread Gus Heck
Search generally trades memory/disk to achieve speed. Thus it tends to use the available JVM memory, and it also benefits greatly from excess memory that the OS can dedicate to caching disk information. For this reason, while it is certainly *possible* to run solr on the same machine as your PHP se

Re: Re-index after upgrade

2022-06-14 Thread Gus Heck
Alias switching is a very good option for cloud users, and one of the benefits of using cloud. OP is on "user managed" (also previously known as "standalone" and then briefly as "legacy mode") so this is not available. Order of preference: 1) build 100% new into new index (much better than the res

Re: Re-index after upgrade

2022-06-12 Thread Gus Heck
What Thomas said, if possible... Definitely set up a test system if you have the resources. Building a new index from scratch ensures that nothing is lurking unconverted and allows you to move to a newer index format. One specific cost of re-indexing into the old index is that the index upgrader t

Re: Solr compatibility with Oracle Database 19c Database

2022-06-08 Thread Gus Heck
Also note that use of Data Import Handler (DIH) is not supported by the Solr community anymore. DIH has become a separate project ( https://github.com/rohitbemax/dataimporthandler) and seems to be in need of some folks who care enough to contribute fixes to it. Using another tool or custom code to

Re: Re: Unique key field

2022-06-07 Thread Gus Heck
he default schemas... so if you have an old default schema as your original source, these things may not be up to date. On Tue, Jun 7, 2022 at 11:23 AM Gus Heck wrote: > check your schema version attribute > > > https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/c

Re: Re: Unique key field

2022-06-07 Thread Gus Heck
check your schema version attribute https://github.com/apache/solr/blob/main/solr/server/solr/configsets/_default/conf/managed-schema.xml#L41 On Tue, Jun 7, 2022 at 9:43 AM Poorna Murali wrote: > If docValues are enabled by default for string field, then the sort queries > on the field will not

Re: Re: Regarding solr field cache

2022-05-26 Thread Gus Heck
You said you had no soft commits *in your config*, so my guess is your indexing process is issuing a commit that includes opening a new searcher. Otherwise the indexed data would not become visible. Hopefully it's doing that at some interval (i.e. systems that receive new data periodically and comm

Re: Growing cores after upgrade to 8.11.1

2022-05-18 Thread Gus Heck
Your link leads to a signup page with advertising for clothing. Please don't do that. On Wed, May 18, 2022 at 1:35 PM Jesús Roca wrote: > Hello, > > We are having problema > > We have a cluster with Solr 8 (15 nodes running RHEL) and ZooKeeper 3.6.2 > (5 nodes) and only one collection of around

Re: Solr 4.10.2 - EOL

2022-05-10 Thread Gus Heck
On Tue, May 10, 2022 at 12:14 PM Walter Underwood wrote: > Related to that, the version numbers here should be updated now that 9.x > is out. > Not quite out, but the vote just passed, so soon ;) > > https://solr.apache.org/downloads.html#about-versions-and-support < > https://solr.apache.org/

Re: Wrong Results for parent blockjoin

2022-04-29 Thread Gus Heck
e could be parents with no children? On Fri, Apr 29, 2022 at 1:21 PM Mikhail Khludnev wrote: > Hello, Gus. > > On Fri, Apr 29, 2022 at 6:55 PM Gus Heck wrote: > > > Also if you have an index with a mixture of > > hierarchical documents and other non block/join d

Re: Wrong Results for parent blockjoin

2022-04-29 Thread Gus Heck
The confusing thing about the block mask is that it is actually defining the set of things that are "Not Children" as opposed to "Are Parents" ... so in cases where you have more than 2 levels, and you want to tread a middle level as a parent to the lower levels this becomes an important distinctio

Re: De-split / Merge shards without creating collection from scratch

2022-04-21 Thread Gus Heck
> > "which are causing some problems" Can you describe the problems? Why do you think merging into a single shard will help? "Divided into monthly collections" Is this via Time Routed Aliases or have you done this manually? On Thu, Apr 21, 2022 at 1:12 PM Ufuk YILMAZ wrote: > We have lots of

Re: Snowflake vs Solr

2022-04-21 Thread Gus Heck
If your application already talks to solr, Then continuing that will mean the application doesn't need to change, and probably solr needs few or no changes (unless you are also upgrading, moving or improving things in solr at the same time). The process that moved data from hadoop to solr will like

  1   2   >