Re: Solr Query time performance

2023-01-29 Thread marc nicole
Let's say you're right about the 200 rows being too few. From which row count I can see the difference reflected in the results as it should (Solr faster)? Le dim. 29 janv. 2023 à 00:34, Jan Høydahl a écrit : > For 200 values you need neither spark nor Solr. A plain Java in mem filter > is much

When to index data into Solr?

2023-01-29 Thread marc nicole
Hello - I want to know whether it is common practice to index all the datasets from the start or the indexation should be performed when the data is being queried? Also, is there a size limit on the data to index into Solr? Thanks.

Re: Solr Query time performance

2023-01-29 Thread Andy Lester
> On Jan 29, 2023, at 4:45 AM, marc nicole wrote: > > Let's say you're right about the 200 rows being too few. From which row > count I can see the difference reflected in the results as it should (Solr > faster)? It depends on how much data is in each record, but I'd think 10,000 - 100,000

Re: When to index data into Solr?

2023-01-29 Thread Gus Heck
Definately all up front. The entire premise of search is that we do as much work at index time as possible so that queries are fast. More importantly, the whole point of the search is to discover what documents the user might want. If you don't index everything from the start you would need a proce

Re: When to index data into Solr?

2023-01-29 Thread marc nicole
so to sum up, it's indexation at data storing time right? Much appreciated. Le dim. 29 janv. 2023 à 17:59, Gus Heck a écrit : > Definately all up front. The entire premise of search is that we do as much > work at index time as possible so that queries are fast. More importantly, > the whole poi

Re: Solr Query time performance

2023-01-29 Thread marc nicole
Much appreciated. Le dim. 29 janv. 2023 à 17:47, Andy Lester a écrit : > > > > On Jan 29, 2023, at 4:45 AM, marc nicole wrote: > > > > Let's say you're right about the 200 rows being too few. From which row > > count I can see the difference reflected in the results as it should > (Solr > > fas

Re: Solr Query time performance

2023-01-29 Thread Dave
You can have 40+ million documents and half a terabyte index size and still not need spark or solr cloud or sharding and get sub second results. Don’t over think it until it becomes a real issue > On Jan 29, 2023, at 1:53 PM, marc nicole wrote: > > Much appreciated. > >> Le dim. 29 janv. 20

Re: When to index data into Solr?

2023-01-29 Thread Dave
And make sure you can always reindex the entire data set at any given moment. Solr/search isn’t meant to be a data store nor reliable. It should be able to be destroyed and recreated when ever needed. > On Jan 29, 2023, at 1:53 PM, marc nicole wrote: > > so to sum up, it's indexation at data

How to index a csv dataset into Solr using SolrJ

2023-01-29 Thread marc nicole
Hi guys, I can't find a reference on how to index a dataset.csv file into Solr using SolrJ. https://solr.apache.org/guide/6_6/using-solrj.html Thanks.

Re: How to index a csv dataset into Solr using SolrJ

2023-01-29 Thread Jan Høydahl
Read csv in your app, create a Solr doc from each line and ingest to Solr in fitting batches. You can use a csv library or just parse each line yourself if the format is fixed. If you need to post csv directly to Solr you’d use a plain http post with content-type csv, but in most cases your app

Re: How to index a csv dataset into Solr using SolrJ

2023-01-29 Thread marc nicole
The Java code should perform the post. Any piece of code to show to better explain this? thanks Le dim. 29 janv. 2023 à 20:29, Jan Høydahl a écrit : > Read csv in your app, create a Solr doc from each line and ingest to Solr > in fitting batches. You can use a csv library or just parse each lin

Add custom configset to Solr Operator on OpenShift/Kubernetes

2023-01-29 Thread Christopher Tate
Hello Solr Operator Users, I was wondering, how would I configure a custom configset with a custom managed-schema and other configs with the Solr Operator like my configset here which I deploy to Red Hat OpenShift? https://github.com/computate-org/smartabyar-smartvillage/blob/main/openshift/kusto