Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Srijan
It would be nice to have more clarity regarding the problem you're trying
to solve. Few questions:

1. Why do you need to return so many search results at the same time? If
it's a typical search usecase, could you not work with some manageable list
of documents, say 50/100? But I'm guessing this is not a typical search
that you're planning to support. In that case, you have to be careful with
how this functionality is being exposed. Like Vincenzo said, you will run
into problems if the endpoint is being hit frequently and you really won't
have a good handle on how much memory to allocate to your JVM.

2. Is Solr your primary source of data? If not, could you retrieve just the
identifiers from Solr and then go to your primary data source for
additional data? We support an 'export' usecase where users can export
1000s of documents at the same time. And our strategy is to get "only the
ids" of the matching docs from Solr and then use an async operation to go
back to our DB and download/export all the DB objects using the same set of
identifiers. On the Solr end, we don't have a problem because we limit our
payload (fields being returned) to only document identifiers (and even in
this case we have a limit - 20k).



On Thu, Apr 28, 2022 at 2:44 AM Vincenzo D'Amore  wrote:

> Ok, but the OP has to know that doing this often can be a serious issue.
> For example if you are implementing an endpoint that can be called 10/100
> times per hour, each call will result in a few humongous objects allocated
> in the JVM.
>


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread matthew sporleder
As far as I know there is no practical upper limit to the number of
documents, only a limit to the amount of memory available in your server
and client. (+ network timeouts, etc)

Deep paging slows down as the pages get bigger so use cursors in that case,
otherwise just test until you get OOM.

On Wed, Apr 27, 2022 at 4:35 PM Neha Gupta  wrote:

> Hi Andy,
>
> I have different cores with different number of documents.
>
> 1) Core 1: - 227625 docs and each document having approx 10 String fields.
>
> 2) Core 2: - Approx 3.5 million documents and each having 3 string fields.
>
> So  my question is if i request in one request lets say approximate 10K
> documents using SOLRJ will that be OK. By safe here i mean approx.
> maximum number of documents that i can request without causing any
> problem in receiving a response from SOLR.
>
> Is that enough to answer the question?
>
> On 27/04/2022 22:26, Andy Lester wrote:
> >
> >> On Apr 27, 2022, at 3:23 PM, Neha Gupta  wrote:
> >>
> >> Just for information I will be firing queries from Java application to
> SOLR using SOLRJ and would like to know how much maximum documents (i.e
> maximum number of rows that i can request in the query) can be returned
> safely from SOLR.
> > It’s impossible to answer that. First, how do you mean “safe”? How big
> are your documents?
> >
> > Let’s turn it around. Do you have a number in mind where you’re
> wondering if Solr can handle it? Like you’re thinking “Can Solr handle 10
> million documents averaging 10K each”?  That’s much easier to address.
> >
> > Andy


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Andy Lester
> 1. Why do you need to return so many search results at the same time? If
> it's a typical search usecase, could you not work with some manageable list
> of documents, say 50/100? But I'm guessing this is not a typical search
> that you're planning to support.


I’d just like to point out that Neha may have a use case that isn’t the typical 
“do a keyword search and return search results, Google-like 10 at a time.” I 
know that may well be the most common use case for Solr, but some of don’t do 
that.

For example, we use Solr to find up to 2500 matching documents, return their 
IDs and some facets, and then the app takes it from there to do the 
presentation and paging. For us, Solr can’t do the paging we need it to do. 
Getting back thousands of records that are fairly small (just some IDs) is 
something Solr does just fine at.

Andy

Wrong Results for parent blockjoin

2022-04-28 Thread James Greene
My team is in the process of moving from solr 6.6 to 8.11.1 and have
noticed some weirdness (wrong parent docs in result) when using the
{!parent blockjoin query parser.  We have multiple 'root' entities
configured in DIH and i'm wondering if this could be a causation or if
there is a bug at play with the blockjoin.  Any more info on how to
diagnose the issue is appreciated!

---
Example data:

[
{
"_root_": "/t2/1/",
"doc_id": "/t2/1/",
"doc_type": "t2",
"t2_id":1,
"chldrn": [
{
"_root_": "/t2/1/",
"_nest_path_": "/chldrn#1",
"doc_id": "/t2/chld/1/",
"doc_type": "chld",
"chld_name": "DEF",
"chld_t2_id":1
}
]
},
{
"_root_": "/p1/1/",
"doc_id": "/p1/1/",
"doc_type": "p1",
"p1_id":1,
"chldrn": [
{
"_root_": "/p1/1/",
"_nest_path_": "/chldrn#1",
"doc_id": "/p1/chld/1/",
"doc_type": "chld",
"chld_name": "ABC",
"chld_p1_id":1
},
{
"_root_": "/p1/1/",
"_nest_path_": "/chldrn#2",
"doc_id": "/p1/chld/2/",
"doc_type": "chld",
"chld_name": "DEF",
"chld_p1_id": 1
}
]
}
]


---
Queries giving the wrong result:

q={!parent which=doc_type:t2}chld_name:ABC

q={!parent which=doc_type:t2}(doc_type:chld AND chld_name:ABC)

q={!parent which=doc_type:t2 v=$qq}chld_name:ABC
?qq=doc_type:chld


---
I found an old thread talking about child docs shouldn't have the same
field name as parent doc (even with different values) here:
https://stackoverflow.com/questions/36602638/solr-returning-incorrect-results-when-filtering-child-docuements
But I got the same results when trying to filter by childen using a
different field:

q={!parent which=doc_type:t2}(_nest_path_:/chldrn AND chld_name:ABC)

I would expect there would be no match since the parent (doc_type:t2) does
not have a child (chld_name:ABC) but i'm actually getting t2 in the result:
[
{
"_root_": "/t2/1/",
"doc_id": "/t2/1/",
"doc_type": "t2",
"t2_id":1,
"chldrn": [
{
"_root_": "/t2/1/",
"_nest_path_": "/chldrn#1",
"doc_id": "/t2/chld/1/",
"doc_type": "chld",
"chld_name": "DEF",
"chld_t2_id":1
}
]
}
]

---
Debug for query returning the wrong document when 0 docs are expected:

"debug":{
"rawquerystring":"{!parent which=doc_type:t2}chld_name:ABC",
"querystring":"{!parent which=doc_type:t2}chld_name:ABC",
"parsedquery":"AllParentsAware(ToParentBlockJoinQuery
(+chld_name:abc))",
"parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)",
"explain":{
  "/t2/1/":"\n0.0 = Score based on 1 child docs in range from 0 to 3,
best match:\n  0.0 = ConstantScore(chld_name:abc)^0.0\n"},
"QParser":"BlockJoinParentQParser",
...
}


---
If I query using a diffrent parent doc_type (doc_type:p1) and child name
(chld_name:DEF) I get the expected result (0 docs returned) using query:

q={!parent which=doc_type:p1}chld_name:DEF


---
If I query using a diffrent parent doc_type (doc_type:p1) and child name
(chld_name:ABC) I get the expected result (1 docs returned) using query:

q={!parent which=doc_type:p1}chld_name:DEF

^^Debug query of getting expected 1 doc back (docs in range is 2 to 3 but
yet the original problematic query has 0 to 3 whatever that means):
"debug":{
"rawquerystring":"{!parent which=doc_type:p1}chld_name:ABC",
"querystring":"{!parent which=doc_type:p1}chld_name:ABC",
"parsedquery":"AllParentsAware(ToParentBlockJoinQuery
(+chld_name:abc))",
"parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)",
"explain":{
  "/t2/1/":"\n0.0 = Score based on 2 child docs in range from 2 to 3,
best match:\n  0.0 = ConstantScore(chld_name:abc)^0.0\n"},
"QParser":"BlockJoinParentQParser",
...
}


---
I have a 'work around' which seems to do the trick but it feels hacky and I
wonder if having to qualify the child docs more will affect query
performance. If I further qualify the child doc using a field that doesn't
exist in the other child docs I get the expected (0 matches) result with
query:

q={!parent which=doc_type:t2}(chld_name:ABC AND chld_t2_id:*)


---
What's also interesting is that if I remove the child doc
{"doc_id":"/p1/chld/1/","chld_name":"ABC"} of parent
{"doc_id":"/p1/1/","doc_type":"p1"} out of the index so that my c

Re: Problem with indexing a String field in SOLR.

2022-04-28 Thread Alessandro Benedetti
Hi Neha,
My shot in the dark:
Have you indexed any document containing that field?

Are you using dynamic fields? (exact field name should have priority over
dynamic fields, but just to double-check).
Can you show us your schema? (at least the part related to that definition?)

Cheers
--
*Alessandro Benedetti*
CEO @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Wed, 27 Apr 2022 at 22:54, Neha Gupta  wrote:

> Dear Solr Community,
>
> I have a very weird situation with SOLR indexing and even after spending a
> day i am not able to find a proper reason so i request for your help.
>
> I tried to index a string field by name "host_common_name". I created the
> field in the schema (schema got updated as well) via SOLR Admin GUI and
> after data import this field seems to be not getting indexed.
>
> After searching i found out that in the Admin GUI, if i select this field
> then only Properties values are being shown while for other fields which
> are getting properly indexed along with properties, schema and indexed
> information is also shown.
>
>
>
>
>
> I tried several ways like deleting the whole schema and then creating the
> new one and so no but still this field with this name is not getting
> indexed.
>
> At last just as a try i created a different field with different name
> "hcn" and with this name field is getting indexed and in Admin Gui all
> values are being shown like properties, schema and so on.
>
>
> So i was just wondering what can be the issue with the name
> "host_common_name". Did anyone came across similar issue? and would like to
> share some information on this.
>
> Thanks in advance for all the help this community always offers.
>
>
> Regards
> Neha Gupta
>


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Christopher Schultz

Neha,

On 4/27/22 16:35, Neha Gupta wrote:

I have different cores with different number of documents.

1) Core 1: - 227625 docs and each document having approx 10 String fields.

2) Core 2: - Approx 3.5 million documents and each having 3 string fields.


We still have no idea about the size of the documents you are talking 
about. Your "3 string fields" could still be gigabytes of data per 
document. But maybe you meant "short string fields between 0 and 255 
characters" or something like that. But if that's what you meant, you 
should have said that.


So my question is if i request in one request lets say approximate 10K 
documents using SOLRJ will that be OK.


Solr will be fine. Will your application be able to handle that much data?


By safe here i mean approx. maximum number of documents that i can
request without causing any problem in receiving a response from
SOLR.
This depends entirely upon your application. If you request 10k 
documents, and each document requires 1MiB of memory, and you store 
every document from Solr in memory in your application, then you will 
require 10 GiB of heap space just for that one response. If you have 
multiple threads making those kinds of requests to Solr all at the same 
time, you will need 10GiB * N threads of heap space.



Is that enough to answer the question?


Is anyone going to ever look at 10k worth of documents at a time? That 
seems like quite a lot.


Maybe your use-case isn't a typical "search for products in a sales 
catalog and show them 50-at-a-time to a web user".


Knowing what your use-case is would be very helpful to answer the 
question "is this a good idea?"


-chris


On 27/04/2022 22:26, Andy Lester wrote:



On Apr 27, 2022, at 3:23 PM, Neha Gupta  wrote:

Just for information I will be firing queries from Java application 
to SOLR using SOLRJ and would like to know how much maximum documents 
(i.e  maximum number of rows that i can request in the query) can be 
returned safely from SOLR.
It’s impossible to answer that. First, how do you mean “safe”? How big 
are your documents?


Let’s turn it around. Do you have a number in mind where you’re 
wondering if Solr can handle it? Like you’re thinking “Can Solr handle 
10 million documents averaging 10K each”?  That’s much easier to address.


Andy


Re: Problem with indexing a String field in SOLR.

2022-04-28 Thread Walter Underwood
Try searching for that field and/or returning that fields. I’ve seen some 
issues with the schema browser not showing data that I know is in the index. I 
think it is related to docvalues, but I haven’t nailed it down.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 28, 2022, at 7:37 AM, Alessandro Benedetti  
> wrote:
> 
> Hi Neha,
> My shot in the dark:
> Have you indexed any document containing that field?
> 
> Are you using dynamic fields? (exact field name should have priority over 
> dynamic fields, but just to double-check).
> Can you show us your schema? (at least the part related to that definition?)
> 
> Cheers
> --
> Alessandro Benedetti
> CEO @ Sease Ltd.
> Apache Lucene/Solr Committer
> Apache Solr PMC Member
> 
> e-mail: a.benede...@sease.io 
> 
> 
> Sease - Information Retrieval Applied
> Consulting | Training | Open Source
> 
> Website: Sease.io 
> LinkedIn  | Twitter 
>  | Youtube 
>  | Github 
> 
> 
> On Wed, 27 Apr 2022 at 22:54, Neha Gupta  > wrote:
> Dear Solr Community,
> 
> I have a very weird situation with SOLR indexing and even after spending a 
> day i am not able to find a proper reason so i request for your help.
> 
> I tried to index a string field by name "host_common_name". I created the 
> field in the schema (schema got updated as well) via SOLR Admin GUI and after 
> data import this field seems to be not getting indexed.
> 
> After searching i found out that in the Admin GUI, if i select this field 
> then only Properties values are being shown while for other fields which are 
> getting properly indexed along with properties, schema and indexed 
> information is also shown.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I tried several ways like deleting the whole schema and then creating the new 
> one and so no but still this field with this name is not getting indexed.
> 
> At last just as a try i created a different field with different name "hcn" 
> and with this name field is getting indexed and in Admin Gui all values are 
> being shown like properties, schema and so on.
> 
> 
> So i was just wondering what can be the issue with the name 
> "host_common_name". Did anyone came across similar issue? and would like to 
> share some information on this.
> 
> Thanks in advance for all the help this community always offers.
> 
> 
> Regards
> Neha Gupta
> 



Re: Problem with indexing a String field in SOLR.

2022-04-28 Thread Rahul Goswami
Neha,
As Alessandro already mentioned, please share your schema if possible.
A wild guess is that sometimes a field is defined as indexed=true
stored=false which gives the impression that the document is missing the
field. Taking a look at the schema would help clarify that.

Thanks,
Rahul

On Thu, Apr 28, 2022 at 10:38 AM Alessandro Benedetti 
wrote:

> Hi Neha,
> My shot in the dark:
> Have you indexed any document containing that field?
>
> Are you using dynamic fields? (exact field name should have priority over
> dynamic fields, but just to double-check).
> Can you show us your schema? (at least the part related to that
> definition?)
>
> Cheers
> --
> *Alessandro Benedetti*
> CEO @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Wed, 27 Apr 2022 at 22:54, Neha Gupta  wrote:
>
>> Dear Solr Community,
>>
>> I have a very weird situation with SOLR indexing and even after spending
>> a day i am not able to find a proper reason so i request for your help.
>>
>> I tried to index a string field by name "host_common_name". I created the
>> field in the schema (schema got updated as well) via SOLR Admin GUI and
>> after data import this field seems to be not getting indexed.
>>
>> After searching i found out that in the Admin GUI, if i select this field
>> then only Properties values are being shown while for other fields which
>> are getting properly indexed along with properties, schema and indexed
>> information is also shown.
>>
>>
>>
>>
>>
>> I tried several ways like deleting the whole schema and then creating the
>> new one and so no but still this field with this name is not getting
>> indexed.
>>
>> At last just as a try i created a different field with different name
>> "hcn" and with this name field is getting indexed and in Admin Gui all
>> values are being shown like properties, schema and so on.
>>
>>
>> So i was just wondering what can be the issue with the name
>> "host_common_name". Did anyone came across similar issue? and would like to
>> share some information on this.
>>
>> Thanks in advance for all the help this community always offers.
>>
>>
>> Regards
>> Neha Gupta
>>
>


Re: Problem with indexing a String field in SOLR.

2022-04-28 Thread Walter Underwood
The original post had a screenshot from the schema browser showing StrField, 
indexed=true, stored=true, omitTermFreqAndPositions=true, omitNorms=true, 
sortMissingLast=true.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 28, 2022, at 8:17 AM, Rahul Goswami  wrote:
> 
> Neha,
> As Alessandro already mentioned, please share your schema if possible.
> A wild guess is that sometimes a field is defined as indexed=true
> stored=false which gives the impression that the document is missing the
> field. Taking a look at the schema would help clarify that.
> 
> Thanks,
> Rahul
> 
> On Thu, Apr 28, 2022 at 10:38 AM Alessandro Benedetti 
> wrote:
> 
>> Hi Neha,
>> My shot in the dark:
>> Have you indexed any document containing that field?
>> 
>> Are you using dynamic fields? (exact field name should have priority over
>> dynamic fields, but just to double-check).
>> Can you show us your schema? (at least the part related to that
>> definition?)
>> 
>> Cheers
>> --
>> *Alessandro Benedetti*
>> CEO @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>> 
>> e-mail: a.benede...@sease.io
>> 
>> 
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>> 
>> Website: Sease.io 
>> LinkedIn  | Twitter
>>  | Youtube
>>  | Github
>> 
>> 
>> 
>> On Wed, 27 Apr 2022 at 22:54, Neha Gupta  wrote:
>> 
>>> Dear Solr Community,
>>> 
>>> I have a very weird situation with SOLR indexing and even after spending
>>> a day i am not able to find a proper reason so i request for your help.
>>> 
>>> I tried to index a string field by name "host_common_name". I created the
>>> field in the schema (schema got updated as well) via SOLR Admin GUI and
>>> after data import this field seems to be not getting indexed.
>>> 
>>> After searching i found out that in the Admin GUI, if i select this field
>>> then only Properties values are being shown while for other fields which
>>> are getting properly indexed along with properties, schema and indexed
>>> information is also shown.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> I tried several ways like deleting the whole schema and then creating the
>>> new one and so no but still this field with this name is not getting
>>> indexed.
>>> 
>>> At last just as a try i created a different field with different name
>>> "hcn" and with this name field is getting indexed and in Admin Gui all
>>> values are being shown like properties, schema and so on.
>>> 
>>> 
>>> So i was just wondering what can be the issue with the name
>>> "host_common_name". Did anyone came across similar issue? and would like to
>>> share some information on this.
>>> 
>>> Thanks in advance for all the help this community always offers.
>>> 
>>> 
>>> Regards
>>> Neha Gupta
>>> 
>> 



Question about Zookeeper architecture

2022-04-28 Thread Heller, George A III CTR (USA)
Hopefully this is the appropriate forum for a Zookeeper architecture question.

 

 

I have two servers, a Primary server and a failover server. Right now my 
Zookeeper is on the primary server, so if it goes down Solr would not work.

 

Should I have a zookeeper on both the primary servers? Should I have a third 
server created and put Zookeeper on the third server? What happens if the third 
server goes doen?

 

Thanks,

George



smime.p7s
Description: S/MIME cryptographic signature


Re: Question about Zookeeper architecture

2022-04-28 Thread matthew sporleder
Ideally you should run zookeeper on three (small) different servers from
solr.

You should always have an odd number of zk servers so they can vote and not
tie.

On Thu, Apr 28, 2022 at 2:26 PM Heller, George A III CTR (USA)
 wrote:

> Hopefully this is the appropriate forum for a Zookeeper architecture
> question.
>
>
>
>
>
> I have two servers, a Primary server and a failover server. Right now my
> Zookeeper is on the primary server, so if it goes down Solr would not work.
>
>
>
> Should I have a zookeeper on both the primary servers? Should I have a
> third server created and put Zookeeper on the third server? What happens if
> the third server goes doen?
>
>
>
> Thanks,
>
> George
>


Re: Problem with indexing a String field in SOLR.

2022-04-28 Thread Neha Gupta

Hi Walter,

I already tried returning that field in the response but it was not present.


Thanks and Regards

Neha Gupta

On 28/04/2022 17:16, Walter Underwood wrote:

Try searching for that field and/or returning that fields. I’ve seen some 
issues with the schema browser not showing data that I know is in the index. I 
think it is related to docvalues, but I haven’t nailed it down.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Apr 28, 2022, at 7:37 AM, Alessandro Benedetti  wrote:

Hi Neha,
My shot in the dark:
Have you indexed any document containing that field?

Are you using dynamic fields? (exact field name should have priority over 
dynamic fields, but just to double-check).
Can you show us your schema? (at least the part related to that definition?)

Cheers
--
Alessandro Benedetti
CEO @ Sease Ltd.
Apache Lucene/Solr Committer
Apache Solr PMC Member

e-mail: a.benede...@sease.io 


Sease - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter  
| Youtube  | Github 


On Wed, 27 Apr 2022 at 22:54, Neha Gupta mailto:neha.gu...@uni-jena.de>> wrote:
Dear Solr Community,

I have a very weird situation with SOLR indexing and even after spending a day 
i am not able to find a proper reason so i request for your help.

I tried to index a string field by name "host_common_name". I created the field 
in the schema (schema got updated as well) via SOLR Admin GUI and after data import this 
field seems to be not getting indexed.

After searching i found out that in the Admin GUI, if i select this field then 
only Properties values are being shown while for other fields which are getting 
properly indexed along with properties, schema and indexed information is also 
shown.









I tried several ways like deleting the whole schema and then creating the new 
one and so no but still this field with this name is not getting indexed.

At last just as a try i created a different field with different name "hcn" and 
with this name field is getting indexed and in Admin Gui all values are being shown like 
properties, schema and so on.


So i was just wondering what can be the issue with the name "host_common_name". 
Did anyone came across similar issue? and would like to share some information on this.

Thanks in advance for all the help this community always offers.


Regards
Neha Gupta





Re: Problem with indexing a String field in SOLR.

2022-04-28 Thread Neha Gupta

Hello Allessandro,

I indexed field with same name but in the different core.

I am not using dynamic fields and schema is as below.

Thanks

Neha Gupta

On 28/04/2022 16:37, Alessandro Benedetti wrote:

Hi Neha,
My shot in the dark:
Have you indexed any document containing that field?

Are you using dynamic fields? (exact field name should have priority 
over dynamic fields, but just to double-check).
Can you show us your schema? (at least the part related to that 
definition?)


Cheers
--
*Alessandro Benedetti*
CEO @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter 
 | Youtube 
 | Github 




On Wed, 27 Apr 2022 at 22:54, Neha Gupta  wrote:

Dear Solr Community,

I have a very weird situation with SOLR indexing and even after
spending a day i am not able to find a proper reason so i request
for your help.

I tried to index a string field by name "host_common_name". I
created the field in the schema (schema got updated as well) via
SOLR Admin GUI and after data import this field seems to be not
getting indexed.

After searching i found out that in the Admin GUI, if i select
this field then only Properties values are being shown while for
other fields which are getting properly indexed along with
properties, schema and indexed information is also shown.





I tried several ways like deleting the whole schema and then
creating the new one and so no but still this field with this name
is not getting indexed.

At last just as a try i created a different field with different
name "hcn" and with this name field is getting indexed and in
Admin Gui all values are being shown like properties, schema and
so on.


So i was just wondering what can be the issue with the name
"host_common_name". Did anyone came across similar issue? and
would like to share some information on this.

Thanks in advance for all the help this community always offers.


Regards
Neha Gupta


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Neha Gupta

First of all Thanks to all who have replied to this question.

Just to make things clear my use case is not a typical one i.e. i am not 
going to show first 50 or 100 result.


My use case is to create a CSV file (matrix kind of) depending on what 
user filters from the web application and the resulting set can range 
from hundreds to millions documents.


Firing SolrRequest again and again and asking for results (may be 10-100 
at a time) from web application will increase the amount of time untill 
the CSV file is done.


So i just want to know from your experience what is the optimal maximum 
value of the document that i can request from SOLR in one go so that the 
number of request from web application to SOLR is minimal.


Documents mainly consists of string fields upto 255 characters.

I am trying out with different values for the rows parameter in the 
request at my end but just want to heard from the SOLR community about 
the advantages and disadvantages of doing the same or better way of 
doing this as i am totally new to SOLR.


Also i will be using SOLRJ so pointers towards that side will be more 
helpful.



Regards

Neha Gupt


On 28/04/2022 16:53, Christopher Schultz wrote:

Neha,

On 4/27/22 16:35, Neha Gupta wrote:

I have different cores with different number of documents.

1) Core 1: - 227625 docs and each document having approx 10 String 
fields.


2) Core 2: - Approx 3.5 million documents and each having 3 string 
fields.


We still have no idea about the size of the documents you are talking 
about. Your "3 string fields" could still be gigabytes of data per 
document. But maybe you meant "short string fields between 0 and 255 
characters" or something like that. But if that's what you meant, you 
should have said that.


So my question is if i request in one request lets say approximate 
10K documents using SOLRJ will that be OK.


Solr will be fine. Will your application be able to handle that much 
data?



By safe here i mean approx. maximum number of documents that i can
request without causing any problem in receiving a response from
SOLR.
This depends entirely upon your application. If you request 10k 
documents, and each document requires 1MiB of memory, and you store 
every document from Solr in memory in your application, then you will 
require 10 GiB of heap space just for that one response. If you have 
multiple threads making those kinds of requests to Solr all at the 
same time, you will need 10GiB * N threads of heap space.



Is that enough to answer the question?


Is anyone going to ever look at 10k worth of documents at a time? That 
seems like quite a lot.


Maybe your use-case isn't a typical "search for products in a sales 
catalog and show them 50-at-a-time to a web user".


Knowing what your use-case is would be very helpful to answer the 
question "is this a good idea?"


-chris


On 27/04/2022 22:26, Andy Lester wrote:


On Apr 27, 2022, at 3:23 PM, Neha Gupta  
wrote:


Just for information I will be firing queries from Java application 
to SOLR using SOLRJ and would like to know how much maximum 
documents (i.e  maximum number of rows that i can request in the 
query) can be returned safely from SOLR.
It’s impossible to answer that. First, how do you mean “safe”? How 
big are your documents?


Let’s turn it around. Do you have a number in mind where you’re 
wondering if Solr can handle it? Like you’re thinking “Can Solr 
handle 10 million documents averaging 10K each”? That’s much easier 
to address.


Andy


Stop a long running query

2022-04-28 Thread Rahul Goswami
Hello,
I am using Solr 7.7.2. Is it possible to stop a long running request ?
Using the "timeAllowed" parameter would return partial results, but I want
the query to outright terminate and ideally throw an exception so as to not
utilize additional resources.

Thanks,
Rahul


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Christopher Schultz

Neha,

On 4/28/22 16:54, Neha Gupta wrote:
Just to make things clear my use case is not a typical one i.e. i am not 
going to show first 50 or 100 result.


My use case is to create a CSV file (matrix kind of) depending on what 
user filters from the web application and the resulting set can range 
from hundreds to millions documents.


Firing SolrRequest again and again and asking for results (may be 10-100 
at a time) from web application will increase the amount of time untill 
the CSV file is done.


So i just want to know from your experience what is the optimal maximum 
value of the document that i can request from SOLR in one go so that the 
number of request from web application to SOLR is minimal.


If you use cursors in Solr, I'm not sure it really matters too much how 
many documents you request at once. Honestly, your application is likely 
to be the bottleneck.


But I'm assuming that you are going to stream-to-disk or at least 
stream-to-client so maybe that doesn't matter, either.



Documents mainly consists of string fields upto 255 characters.


This doesn't really matter much, as long as you are streaming everything 
and not buffering.


I am trying out with different values for the rows parameter in the 
request at my end but just want to heard from the SOLR community about 
the advantages and disadvantages of doing the same or better way of 
doing this as i am totally new to SOLR.


Also i will be using SOLRJ so pointers towards that side will be more 
helpful.


You could also probably just ... try it. There are no benchmarks better 
than ones against your own environment.


-chris



On 28/04/2022 16:53, Christopher Schultz wrote:

Neha,

On 4/27/22 16:35, Neha Gupta wrote:

I have different cores with different number of documents.

1) Core 1: - 227625 docs and each document having approx 10 String 
fields.


2) Core 2: - Approx 3.5 million documents and each having 3 string 
fields.


We still have no idea about the size of the documents you are talking 
about. Your "3 string fields" could still be gigabytes of data per 
document. But maybe you meant "short string fields between 0 and 255 
characters" or something like that. But if that's what you meant, you 
should have said that.


So my question is if i request in one request lets say approximate 
10K documents using SOLRJ will that be OK.


Solr will be fine. Will your application be able to handle that much 
data?



By safe here i mean approx. maximum number of documents that i can
request without causing any problem in receiving a response from
SOLR.
This depends entirely upon your application. If you request 10k 
documents, and each document requires 1MiB of memory, and you store 
every document from Solr in memory in your application, then you will 
require 10 GiB of heap space just for that one response. If you have 
multiple threads making those kinds of requests to Solr all at the 
same time, you will need 10GiB * N threads of heap space.



Is that enough to answer the question?


Is anyone going to ever look at 10k worth of documents at a time? That 
seems like quite a lot.


Maybe your use-case isn't a typical "search for products in a sales 
catalog and show them 50-at-a-time to a web user".


Knowing what your use-case is would be very helpful to answer the 
question "is this a good idea?"


-chris


On 27/04/2022 22:26, Andy Lester wrote:



On Apr 27, 2022, at 3:23 PM, Neha Gupta wrote:

Just for information I will be firing queries from Java application 
to SOLR using SOLRJ and would like to know how much maximum 
documents (i.e  maximum number of rows that i can request in the 
query) can be returned safely from SOLR.
It’s impossible to answer that. First, how do you mean “safe”? How 
big are your documents?


Let’s turn it around. Do you have a number in mind where you’re 
wondering if Solr can handle it? Like you’re thinking “Can Solr 
handle 10 million documents averaging 10K each”? That’s much easier 
to address.


Andy


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Vincenzo D'Amore
> Firing SolrRequest again and again and asking for results (may be 10-100
at a time) from web application will increase the amount of time until
the CSV file is done.

Even if your assumption was correct, really the export of a CSV file is a
task so time critical? I don't think the gain was so huge.
If the performance is so important, you should use SolrJ to build something
that reads from a stream (solr exports api or solr streams api) and then
writes directly into a stream, that produces directly the csv file.

-- 
Vincenzo D'Amore


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread Yonik Seeley
It depends ;-)

If you are directly querying a single Solr node, then the additional memory
usage is (max_results * 4) if not retrieving scores.  It's just
a single int per document to keep track of the docids that matched the
query.  Documents are "streamed" to the client... the
actual stored fields for each document are only loaded when needed to write
to the output stream. If one is retrieving scores as well,
then the memory usage is (max_results * 8) (4 bytes for the int id, 4 bytes
for the float score.)

However, if one is using distributed search, then the entire response *is*
aggregated in memory before sending back to the client.
So if you are using SolrCloud and wish to do big bulk operations like this,
target individual nodes with distrib=false.

-Yonik


On Wed, Apr 27, 2022 at 4:23 PM Neha Gupta  wrote:

> Dear Solr Community,
>
> I would like to know what is the safe number of documents that can be
> returned from a SOLR.
>
> Just for information I will be firing queries from Java application to
> SOLR using SOLRJ and would like to know how much maximum documents (i.e
> maximum number of rows that i can request in the query) can be returned
> safely from SOLR.
>
> It would be great if you can please share your experience with regard to
> the same.
>
>
> Thanks and Regards
> Neha Gupta
>
>


Re: Regarding maximum number of documents that can be returned safely from SOLR to Java Application.

2022-04-28 Thread David Hastings
the 30+ million records I retrieved were always from a single standalone
solr node, and yes you can do that frequently and it doesnt have an impact
on the rest of the searches happening assuming you have enough memory to
deal with it.  there is nothing wrong with requesting every one of your
documents as well as every single stored field.  It simply does just work.

so like was stated before, just try it and make a run and see what
happens.

On Thu, Apr 28, 2022 at 7:39 PM Yonik Seeley  wrote:

> It depends ;-)
>
> If you are directly querying a single Solr node, then the additional memory
> usage is (max_results * 4) if not retrieving scores.  It's just
> a single int per document to keep track of the docids that matched the
> query.  Documents are "streamed" to the client... the
> actual stored fields for each document are only loaded when needed to write
> to the output stream. If one is retrieving scores as well,
> then the memory usage is (max_results * 8) (4 bytes for the int id, 4 bytes
> for the float score.)
>
> However, if one is using distributed search, then the entire response *is*
> aggregated in memory before sending back to the client.
> So if you are using SolrCloud and wish to do big bulk operations like this,
> target individual nodes with distrib=false.
>
> -Yonik
>
>
> On Wed, Apr 27, 2022 at 4:23 PM Neha Gupta  wrote:
>
> > Dear Solr Community,
> >
> > I would like to know what is the safe number of documents that can be
> > returned from a SOLR.
> >
> > Just for information I will be firing queries from Java application to
> > SOLR using SOLRJ and would like to know how much maximum documents (i.e
> > maximum number of rows that i can request in the query) can be returned
> > safely from SOLR.
> >
> > It would be great if you can please share your experience with regard to
> > the same.
> >
> >
> > Thanks and Regards
> > Neha Gupta
> >
> >
>