Re: Index for text with space

2021-10-25 Thread son hoang
Is there any way in the query so that I do not need to reindex the whole data?

On 2021/10/23 15:39:18, Walter Underwood  wrote: 
> Agreed. There is a simple fix. Index all the words. Also, stop using 
> EdgeNgramFilter.
> That is only used for completion, not word search.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> > On Oct 23, 2021, at 4:31 AM, Dave  wrote:
> > 
> > Why ever would you not index less than three characters?
> > “To be or not to be”
> > Seems like a significant search 
> > 
> >> On Oct 23, 2021, at 7:28 AM, son hoang  wrote:
> >> 
> >> Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text 
> >> can be separated into a token "Abbas" (and "Al"  but it is not counted as 
> >> a token as it has 2 chars only) then we can apply OR condition in the 
> >> query?  
> >> 
> >>> On 2021/10/22 14:37:51, Andy C  wrote: 
> >>> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
> >>> field type. You have configured it with minGramSize="3" and have not
> >>> specified preserveOriginal="true".
> >>> 
> >>> So words less than 3 characters will not be indexed, and therefore can't 
> >>> be
> >>> searched.
> >>> 
> >>> See
> >>> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
> >>> 
> >>> - Andy -
> >>> 
>  On Fri, Oct 22, 2021 at 10:12 AM son hoang  wrote:
>  
>  Thanks, Thamiz
>  
>  It seems that I have index=StandardTokenizerFactory causing the issue
>  
>  I do not want to re-index. Is there any solution ? Should I have query
>  "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
>  the query field  (eg: there is a OR match "Abbas" ?
>  
>  Thanks
>  
>  On 2021/10/21 07:56:20, Thamizhazhagan B 
>  wrote:
> > Hi,
> > 
> > Create a copy field as below and use this copyfield in your query..
> > 
> > 
> >   indexed="true" termVectors="true" termPositions="true" 
>  termOffsets="true"/>
> > 
> >   sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
>  multiValued="false">
> >   
> > 
> >   ignoreCase="true"/>
> > 
> >   
> >   
> > 
> >   ignoreCase="true"/>
> >   ignoreCase="true" synonyms="synonyms.txt"/>
> > 
> >   
> > 
> > 
> > Thanks,
> > Thamizh
> > 
> > 
> > -Original Message-
> > From: son hoang 
> > Sent: Thursday, October 21, 2021 8:19 AM
> > To: users@solr.apache.org
> > Subject: Index for text with space
> > 
> > Caution: This email came from outside Kaiser Permanente. Do not open
>  attachments or click on links if you do not recognize the sender.
> > 
> > __
> > Hello
> > 
> > I have a config like this:
> > 
> > 
> >   
> >   
> >   
> >   
> >> maxGramSize="15"/>
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> > 
> > Using this config:
> > 
> > 1. When I search for "Abbas", the result for "Al Abbas" appears.
> > 
> > 2. When I search for "Al Abbas" in the search field, I get no results.
> > 
> > It seems that "Al Abbas" is not indexed. What I should do in the config
>  so #2 can return the result
> > 
> > Many thanks
> > NOTICE TO RECIPIENT:  If you are not the intended recipient of this
>  e-mail, you are prohibited from sharing, copying, or otherwise using or
>  disclosing its contents.  If you have received this e-mail in error, 
>  please
>  notify the sender immediately by reply e-mail and permanently delete this
>  e-mail and any attachments without reading, forwarding or saving them.
>  v.173.295  Thank you.
> > 
>  
> >>> 
> 
> 


Re: Index for text with space

2021-10-25 Thread Dave
You can pre process the query to remove anything not indexed (less than 3 
characters) but that initial scheme decision was a mistake, and should be 
remedied and reindexed. 

> On Oct 25, 2021, at 8:36 AM, son hoang  wrote:
> 
> Is there any way in the query so that I do not need to reindex the whole 
> data?
> 
>> On 2021/10/23 15:39:18, Walter Underwood  wrote: 
>> Agreed. There is a simple fix. Index all the words. Also, stop using 
>> EdgeNgramFilter.
>> That is only used for completion, not word search.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
 On Oct 23, 2021, at 4:31 AM, Dave  wrote:
>>> 
>>> Why ever would you not index less than three characters?
>>> “To be or not to be”
>>> Seems like a significant search 
>>> 
 On Oct 23, 2021, at 7:28 AM, son hoang  wrote:
 
 Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text 
 can be separated into a token "Abbas" (and "Al"  but it is not counted as 
 a token as it has 2 chars only) then we can apply OR condition in the 
 query?  
 
> On 2021/10/22 14:37:51, Andy C  wrote: 
> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
> field type. You have configured it with minGramSize="3" and have not
> specified preserveOriginal="true".
> 
> So words less than 3 characters will not be indexed, and therefore can't 
> be
> searched.
> 
> See
> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
> 
> - Andy -
> 
>> On Fri, Oct 22, 2021 at 10:12 AM son hoang  wrote:
>> 
>> Thanks, Thamiz
>> 
>> It seems that I have index=StandardTokenizerFactory causing the issue
>> 
>> I do not want to re-index. Is there any solution ? Should I have query
>> "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
>> the query field  (eg: there is a OR match "Abbas" ?
>> 
>> Thanks
>> 
>> On 2021/10/21 07:56:20, Thamizhazhagan B 
>> wrote:
>>> Hi,
>>> 
>>> Create a copy field as below and use this copyfield in your query..
>>> 
>>> 
>>> > indexed="true" termVectors="true" termPositions="true" 
>> termOffsets="true"/>
>>> 
>>> > sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
>> multiValued="false">
>>>  
>>>
>>>> ignoreCase="true"/>
>>>
>>>  
>>>  
>>>
>>>> ignoreCase="true"/>
>>>> ignoreCase="true" synonyms="synonyms.txt"/>
>>>
>>>  
>>> 
>>> 
>>> Thanks,
>>> Thamizh
>>> 
>>> 
>>> -Original Message-
>>> From: son hoang 
>>> Sent: Thursday, October 21, 2021 8:19 AM
>>> To: users@solr.apache.org
>>> Subject: Index for text with space
>>> 
>>> Caution: This email came from outside Kaiser Permanente. Do not open
>> attachments or click on links if you do not recognize the sender.
>>> 
>>> __
>>> Hello
>>> 
>>> I have a config like this:
>>> 
>>> 
>>>  
>>>  
>>>  
>>>  
>>>  >> maxGramSize="15"/>
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>> 
>>> Using this config:
>>> 
>>> 1. When I search for "Abbas", the result for "Al Abbas" appears.
>>> 
>>> 2. When I search for "Al Abbas" in the search field, I get no results.
>>> 
>>> It seems that "Al Abbas" is not indexed. What I should do in the config
>> so #2 can return the result
>>> 
>>> Many thanks
>>> NOTICE TO RECIPIENT:  If you are not the intended recipient of this
>> e-mail, you are prohibited from sharing, copying, or otherwise using or
>> disclosing its contents.  If you have received this e-mail in error, 
>> please
>> notify the sender immediately by reply e-mail and permanently delete this
>> e-mail and any attachments without reading, forwarding or saving them.
>> v.173.295  Thank you.
>>> 
>> 
> 
>> 
>> 


Re: Index for text with space

2021-10-25 Thread Andy C
I would think your problem goes beyond 1 and 2 characters words not being
indexed.

With your current field type definition, if someone searches for "can" it
will retrieve documents that contain any word that start with "can". So
"candidate", canadian", "cantina", etc.

Is this really the desired search behavior?

On Mon, Oct 25, 2021 at 8:48 AM Dave  wrote:

> You can pre process the query to remove anything not indexed (less than 3
> characters) but that initial scheme decision was a mistake, and should be
> remedied and reindexed.
>
> > On Oct 25, 2021, at 8:36 AM, son hoang  wrote:
> >
> > Is there any way in the query so that I do not need to reindex the
> whole data?
> >
> >> On 2021/10/23 15:39:18, Walter Underwood 
> wrote:
> >> Agreed. There is a simple fix. Index all the words. Also, stop using
> EdgeNgramFilter.
> >> That is only used for completion, not word search.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
>  On Oct 23, 2021, at 4:31 AM, Dave 
> wrote:
> >>>
> >>> Why ever would you not index less than three characters?
> >>> “To be or not to be”
> >>> Seems like a significant search
> >>>
>  On Oct 23, 2021, at 7:28 AM, son hoang  wrote:
> 
>  Yep, words less than 3 chars will not be indexed. But if "Al Abbas"
> text can be separated into a token "Abbas" (and "Al"  but it is not counted
> as a token as it has 2 chars only) then we can apply OR condition in the
> query?
> 
> > On 2021/10/22 14:37:51, Andy C  wrote:
> > The issue looks to me to be with the use of EdgeNGramFilterFactory
> in your
> > field type. You have configured it with minGramSize="3" and have not
> > specified preserveOriginal="true".
> >
> > So words less than 3 characters will not be indexed, and therefore
> can't be
> > searched.
> >
> > See
> >
> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
> >
> > - Andy -
> >
> >> On Fri, Oct 22, 2021 at 10:12 AM son hoang 
> wrote:
> >>
> >> Thanks, Thamiz
> >>
> >> It seems that I have index=StandardTokenizerFactory causing the
> issue
> >>
> >> I do not want to re-index. Is there any solution ? Should I have
> query
> >> "OR" so that the search can return  "Al Abbas" when I have  "Al
> Abbas" in
> >> the query field  (eg: there is a OR match "Abbas" ?
> >>
> >> Thanks
> >>
> >> On 2021/10/21 07:56:20, Thamizhazhagan B  >
> >> wrote:
> >>> Hi,
> >>>
> >>> Create a copy field as below and use this copyfield in your query..
> >>>
> >>> 
> >>>  >> indexed="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> >>>
> >>>  >> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
> >> multiValued="false">
> >>>  
> >>>
> >>> >> ignoreCase="true"/>
> >>>
> >>>  
> >>>  
> >>>
> >>> >> ignoreCase="true"/>
> >>> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >>>
> >>>  
> >>> 
> >>>
> >>> Thanks,
> >>> Thamizh
> >>>
> >>>
> >>> -Original Message-
> >>> From: son hoang 
> >>> Sent: Thursday, October 21, 2021 8:19 AM
> >>> To: users@solr.apache.org
> >>> Subject: Index for text with space
> >>>
> >>> Caution: This email came from outside Kaiser Permanente. Do not
> open
> >> attachments or click on links if you do not recognize the sender.
> >>>
> >>>
> __
> >>> Hello
> >>>
> >>> I have a config like this:
> >>>
> >>>  positionIncrementGap="100">
> >>>  
> >>>  
> >>>  
> >>>  
> >>>   >>> maxGramSize="15"/>
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>  
> >>>
> >>> Using this config:
> >>>
> >>> 1. When I search for "Abbas", the result for "Al Abbas" appears.
> >>>
> >>> 2. When I search for "Al Abbas" in the search field, I get no
> results.
> >>>
> >>> It seems that "Al Abbas" is not indexed. What I should do in the
> config
> >> so #2 can return the result
> >>>
> >>> Many thanks
> >>> NOTICE TO RECIPIENT:  If you are not the intended recipient of this
> >> e-mail, you are prohibited from sharing, copying, or otherwise
> using or
> >> disclosing its contents.  If you have received this e-mail in
> error, please
> >> notify the sender immediately by reply e-mail and permanently
> delete this
> >> e-mail and any attachments without reading, forwarding or saving
> them.
> >> v.173.295  Thank you.
> >>>
> >>
> >
> >>
> >>
>


Child doc question

2021-10-25 Thread Stephen Lewis Bianamara
Hi SOLR Community,

I'm experimenting with solr 8.10 and trying to get a query pattern with
child docs to work. An example of a nested document structure I'd like to
search is below. In this example, there will only be two levels, child of
type:post and /comments children.
{
"id": "post1",
"type": "post",
"post_en": "I put lemon on my apple slices to keep them fresh",
"comments": [
{
"id": "comment1",
"type": "comment",
"comment_en": "Lime works too"
},
{
"id": "comment2",
"type": "comment",
"comment_en": "Does it work for pears?"
}
]
}

What I'd like is to be able to do keyword search for /lemon apple/ and only
return the parent; /lemon lime/ and return the parent and comment1; /lemon
pear/ and return the parent and comment2; /lime pear/ and return the
parent, comment1, and comment2. And /lime gum/ should return nothing (as if
it were an AND query). Additionally, this should all be done with relevance.

I've tried a few combinations of nested docs from this documentation
, but
am having trouble getting this to work. I wonder if I'm asking more from
block join/child doc transformer than it currently supports, or perhaps I'm
just missing something. Can someone familiar with nesting documents help me
out? I've included my schema below as well.

Thanks!
Stephen



  









  
  id
  




  



  
  



  

  



Re: Significant performance hit on replication compared to older version

2021-10-25 Thread Dominic Humphries
I think I found it!

I didn't realise, but we have boost files for the core I'm testing and the
boost is applied after replication! Setting the contents of the files to
empty completely removes the post-replication performance problem we were
seeing.

So now my question becomes "Why is boosting taking so much longer for the
upgrade?"

Since the upgrade has its own independent set of data, I'm wondering if
it's as simple as the IDs it's trying to boost don't exist and it takes
longer to find out an item is missing than it does to find one that does? I
believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems like
the next logical step - if there's no performance hit when it has the same
data as the 8.3.1 replica, then that's almost certainly the problem.

Fingers crossed!

On Sun, 24 Oct 2021 at 10:26, Deepak Goel  wrote:

> There could be some testing and cooling happening post-replication. will
> have to dig a bit more into the code.
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are treated
> - Mahatma Gandhi"
>
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
>  wrote:
>
> > One more tidbit: I just tried leaving replication off for a few hours and
> > then triggering a "big" replication run so I could see the distinct
> stages.
> >
> >
> >- Beginning replication didn't cause any performance degradation.
> >- Several minutes of downloading the replication files saw no
> > degradation
> >- Only after downloading had completed did we start to see performance
> >issues in our tests
> >- But we saw the "number of docs/timestamp of latest file" both jump
> >almost immediately after downloading completed and never move again
> >- But the performance degradation continued for about seven more
> minutes
> >even though replication was clearly finished at this point
> >
> >
> > Is there some kind of re-indexing optimization thing that solr can run
> > post-replication? At this point it's about my only remaining suspect..
> >
>


Re: Significant performance hit on replication compared to older version

2021-10-25 Thread Dominic Humphries
Performance with the replica pulling from 8.3.1 was actually worse. And
looking at the data in the databases and the boost file contents, I'm
dubious it's a problem of incompatible boost files. I think the performance
of importing/applying the boosts really is what's responsible for the issue
we see. Not sure what else to test to verify or disprove this..

On Mon, 25 Oct 2021 at 14:56, Dominic Humphries  wrote:

> I think I found it!
>
> I didn't realise, but we have boost files for the core I'm testing and the
> boost is applied after replication! Setting the contents of the files to
> empty completely removes the post-replication performance problem we were
> seeing.
>
> So now my question becomes "Why is boosting taking so much longer for the
> upgrade?"
>
> Since the upgrade has its own independent set of data, I'm wondering if
> it's as simple as the IDs it's trying to boost don't exist and it takes
> longer to find out an item is missing than it does to find one that does? I
> believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems like
> the next logical step - if there's no performance hit when it has the same
> data as the 8.3.1 replica, then that's almost certainly the problem.
>
> Fingers crossed!
>
> On Sun, 24 Oct 2021 at 10:26, Deepak Goel  wrote:
>
>> There could be some testing and cooling happening post-replication. will
>> have to dig a bit more into the code.
>>
>> Deepak
>> "The greatness of a nation can be judged by the way its animals are
>> treated
>> - Mahatma Gandhi"
>>
>> +91 73500 12833
>> deic...@gmail.com
>>
>> Facebook: https://www.facebook.com/deicool
>> LinkedIn: www.linkedin.com/in/deicool
>>
>> "Plant a Tree, Go Green"
>>
>> Make In India : http://www.makeinindia.com/home
>>
>>
>> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
>>  wrote:
>>
>> > One more tidbit: I just tried leaving replication off for a few hours
>> and
>> > then triggering a "big" replication run so I could see the distinct
>> stages.
>> >
>> >
>> >- Beginning replication didn't cause any performance degradation.
>> >- Several minutes of downloading the replication files saw no
>> > degradation
>> >- Only after downloading had completed did we start to see
>> performance
>> >issues in our tests
>> >- But we saw the "number of docs/timestamp of latest file" both jump
>> >almost immediately after downloading completed and never move again
>> >- But the performance degradation continued for about seven more
>> minutes
>> >even though replication was clearly finished at this point
>> >
>> >
>> > Is there some kind of re-indexing optimization thing that solr can run
>> > post-replication? At this point it's about my only remaining suspect..
>> >
>>
>


Fwd: Is there a strict instruction hot to establish backups on S3 from Solr instance (8.10.1 version)

2021-10-25 Thread Dmytro Kinakh
Hi there,



I’m, new in solr so let me know in case I’m missing anything here.

I’m following this guide but no luck so far -
https://solr.apache.org/guide/8_10/making-and-restoring-backups.html .



So, that I’ve did at the moment

In  solr.xml I’ve added backup section



  



  solr-backups

  us-east-1



  



After that, I’ve added S3 plugin (via ansible as I do with the rest of the
things usually)


- name: Copy SOLR S3 module to server

  copy:

src: /opt/solr/dist/solr-s3-repository-8.10.1.jar

dest: /opt/solr/server/solr-webapp/webapp/WEB-INF/lib

owner: solr

group: solr

remote_src: yes

  become_user: root



And finally, I’m trying to load that plugin in  solrconfig.xml







To test backups I’m running the following commands (do a backup and ensure
it’s was created properly)



http://MY-SERVER.us-east-1.elb.amazonaws.com:8983/solr/schools/replication?command=backup


http://MY-SERVER.us-east-1.elb.amazonaws.com:8983/solr/schools/replication?command=details&wt=xml




But after this (backup is ~200MB) I don’t see anything in my S3. I also
don’t see any logs under logging section that something failed in terms of
backup upload to my bucket.



Are there any other steps I’ve missed? At least I’m expecting that there
will be a message that something failed while trying to create a backup.

Best regards,
Dmytro Kinakh


Is there a strict instruction hot to establish backups on S3 from Solr instance (8.10.1 version)

2021-10-25 Thread Dmytro Kinakh
Hi there,
I’m, new in solr so let me know in case I’m missing anything here.
I’m following this guide but no luck so far -
https://solr.apache.org/guide/8_10/making-and-restoring-backups.html .


So, that I’ve did at the moment
In  solr.xml I’ve added backup section

   

  solr-backups
  us-east-1

  



After that, I’ve added S3 plugin (via ansible as I do with the rest of the
things usually)
- name: Copy SOLR S3 module to server
  copy:
src: /opt/solr/dist/solr-s3-repository-8.10.1.jar
dest: /opt/solr/server/solr-webapp/webapp/WEB-INF/lib
owner: solr
group: solr
remote_src: yes
  become_user: root

 And finally, I’m tryin to load that plugin in  solrconfig.xml



To test backups I’m running the following commands (do a backup and ensure
it’s was created properly)


http://MY-SERVER.us-east-1.elb.amazonaws.com:8983/solr/schools/replication?command=backup

http://MY-SERVER.us-east-1.elb.amazonaws.com:8983/solr/schools/replication?command=details&wt=xml

But after this (backup is ~200MB) I don’t see anything in my S3. I also
don’t see any logs under logging section that something failed in terms of
backup upload to my bucket.

 Is there any other steps I’ve missed? At least I’m expecting that there
will be a message that something failed while trying to create a backup.

Best regards,
Dmytro Kinakh


Skype: id.dmitriko
Tel. +380966502432


Re: Significant performance hit on replication compared to older version

2021-10-25 Thread Charlie Hull

Hi Dominic,

Could you clarify what you mean by boost files in this context? Just 
curious


Charlie

On 25/10/2021 17:11, Dominic Humphries wrote:

Performance with the replica pulling from 8.3.1 was actually worse. And
looking at the data in the databases and the boost file contents, I'm
dubious it's a problem of incompatible boost files. I think the performance
of importing/applying the boosts really is what's responsible for the issue
we see. Not sure what else to test to verify or disprove this..

On Mon, 25 Oct 2021 at 14:56, Dominic Humphries  wrote:


I think I found it!

I didn't realise, but we have boost files for the core I'm testing and the
boost is applied after replication! Setting the contents of the files to
empty completely removes the post-replication performance problem we were
seeing.

So now my question becomes "Why is boosting taking so much longer for the
upgrade?"

Since the upgrade has its own independent set of data, I'm wondering if
it's as simple as the IDs it's trying to boost don't exist and it takes
longer to find out an item is missing than it does to find one that does? I
believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems like
the next logical step - if there's no performance hit when it has the same
data as the 8.3.1 replica, then that's almost certainly the problem.

Fingers crossed!

On Sun, 24 Oct 2021 at 10:26, Deepak Goel  wrote:


There could be some testing and cooling happening post-replication. will
have to dig a bit more into the code.

Deepak
"The greatness of a nation can be judged by the way its animals are
treated
- Mahatma Gandhi"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
 wrote:


One more tidbit: I just tried leaving replication off for a few hours

and

then triggering a "big" replication run so I could see the distinct

stages.


- Beginning replication didn't cause any performance degradation.
- Several minutes of downloading the replication files saw no
degradation
- Only after downloading had completed did we start to see

performance

issues in our tests
- But we saw the "number of docs/timestamp of latest file" both jump
almost immediately after downloading completed and never move again
- But the performance degradation continued for about seven more

minutes

even though replication was clearly finished at this point


Is there some kind of re-indexing optimization thing that solr can run
post-replication? At this point it's about my only remaining suspect..



--
Charlie Hull - Managing Consultant at OpenSource Connections Limited 

Founding member of The Search Network  
and co-author of Searching the Enterprise 


tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II


Receiving NPE during search.

2021-10-25 Thread Michael Conrad
We have run across an issue where certain searches result in a vague 
NPE. Any help into pointing into a correct direction would be appreciated.


### Example Log Entries:

2021-10-25 18:08:21.683 ERROR (qtp1659840424-39975) [c:alldata s:shard1 
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase 
java.lang.NullPointerException


2021-10-25 18:08:21.683 ERROR (qtp1659840424-39975) [c:alldata s:shard1 
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall 
null:java.lang.NullPointerException


2021-10-25 18:09:01.673 ERROR (qtp1659840424-40471) [c:alldata s:shard1 
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase 
java.lang.NullPointerException


2021-10-25 18:09:01.674 ERROR (qtp1659840424-40471) [c:alldata s:shard1 
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall 
null:java.lang.NullPointerException


### Example Json Response:

{
  "responseHeader":{
    "zkConnected":true,
    "status":500,
    "QTime":37,
    "params":{
  "q":"issuedate_tdt:[\"2014-04-18T04:00:00Z\" TO 
\"2014-04-19T04:00:00Z\"}",

  "defType":"edismax",
  "distrib":"true",
  "_stateVer_":"alldata:2543",
  "qf":"_text_txt_en_split^0.1 _text_txt^0.5 _text_ws^2",
  "fl":"id",
  "start":"0",
  "q.op":"AND",
  "rows":"5"}},
  "error":{
    "trace":"java.lang.NullPointerException\n",
    "code":500}}



Re: Receiving NPE during search.

2021-10-25 Thread Rahul Goswami
Can you please share the complete exception stack trace from the log ?

- Rahul

On Mon, Oct 25, 2021 at 2:12 PM Michael Conrad  wrote:

> We have run across an issue where certain searches result in a vague
> NPE. Any help into pointing into a correct direction would be appreciated.
>
> ### Example Log Entries:
>
> 2021-10-25 18:08:21.683 ERROR (qtp1659840424-39975) [c:alldata s:shard1
> r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase
> java.lang.NullPointerException
>
> 2021-10-25 18:08:21.683 ERROR (qtp1659840424-39975) [c:alldata s:shard1
> r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall
> null:java.lang.NullPointerException
>
> 2021-10-25 18:09:01.673 ERROR (qtp1659840424-40471) [c:alldata s:shard1
> r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase
> java.lang.NullPointerException
>
> 2021-10-25 18:09:01.674 ERROR (qtp1659840424-40471) [c:alldata s:shard1
> r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall
> null:java.lang.NullPointerException
>
> ### Example Json Response:
>
> {
>"responseHeader":{
>  "zkConnected":true,
>  "status":500,
>  "QTime":37,
>  "params":{
>"q":"issuedate_tdt:[\"2014-04-18T04:00:00Z\" TO
> \"2014-04-19T04:00:00Z\"}",
>"defType":"edismax",
>"distrib":"true",
>"_stateVer_":"alldata:2543",
>"qf":"_text_txt_en_split^0.1 _text_txt^0.5 _text_ws^2",
>"fl":"id",
>"start":"0",
>"q.op":"AND",
>"rows":"5"}},
>"error":{
>  "trace":"java.lang.NullPointerException\n",
>  "code":500}}
>
>


Re: Receiving NPE during search.

2021-10-25 Thread Michael Conrad
I changed the global log level from WARN to INFO and still don't see a 
stack trace.



2021-10-25 18:38:47.448 ERROR (qtp1659840424-40619) [c:alldata s:shard1 
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase 
java.lang.NullPointerException


2021-10-25 18:38:47.449 ERROR (qtp1659840424-40619) [c:alldata s:shard1 
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall 
null:java.lang.NullPointerException





On 10/25/21 2:16 PM, Rahul Goswami wrote:

Can you please share the complete exception stack trace from the log ?

- Rahul

On Mon, Oct 25, 2021 at 2:12 PM Michael Conrad  wrote:


We have run across an issue where certain searches result in a vague
NPE. Any help into pointing into a correct direction would be appreciated.

### Example Log Entries:

2021-10-25 18:08:21.683 ERROR (qtp1659840424-39975) [c:alldata s:shard1
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase
java.lang.NullPointerException

2021-10-25 18:08:21.683 ERROR (qtp1659840424-39975) [c:alldata s:shard1
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall
null:java.lang.NullPointerException

2021-10-25 18:09:01.673 ERROR (qtp1659840424-40471) [c:alldata s:shard1
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.h.RequestHandlerBase
java.lang.NullPointerException

2021-10-25 18:09:01.674 ERROR (qtp1659840424-40471) [c:alldata s:shard1
r:core_node64 x:alldata_shard1_replica_n63] o.a.s.s.HttpSolrCall
null:java.lang.NullPointerException

### Example Json Response:

{
"responseHeader":{
  "zkConnected":true,
  "status":500,
  "QTime":37,
  "params":{
"q":"issuedate_tdt:[\"2014-04-18T04:00:00Z\" TO
\"2014-04-19T04:00:00Z\"}",
"defType":"edismax",
"distrib":"true",
"_stateVer_":"alldata:2543",
"qf":"_text_txt_en_split^0.1 _text_txt^0.5 _text_ws^2",
"fl":"id",
"start":"0",
"q.op":"AND",
"rows":"5"}},
"error":{
  "trace":"java.lang.NullPointerException\n",
  "code":500}}






XPathEntityProcessor

2021-10-25 Thread Scott Derrick

Trying to figure out why the xpath processor is only processing on entity when 
there are multiple entities.

I have the following in my config file

   
  
   

There are multiple entities at the xpath="/TEI/text/body//placeName"

but only the last is stored and searchable?

thanks,

Scott


Re: Significant performance hit on replication compared to older version

2021-10-25 Thread Atita Arora
To me this (> performance sink) seems more like the job of a merge policy
which would take resources after documents are added to the index and
segments are reorganized based on custom / default policy.



On Mon, Oct 25, 2021 at 8:04 PM Charlie Hull <
ch...@opensourceconnections.com> wrote:

> Hi Dominic,
>
> Could you clarify what you mean by boost files in this context? Just
> curious
>
> Charlie
>
> On 25/10/2021 17:11, Dominic Humphries wrote:
> > Performance with the replica pulling from 8.3.1 was actually worse. And
> > looking at the data in the databases and the boost file contents, I'm
> > dubious it's a problem of incompatible boost files. I think the
> performance
> > of importing/applying the boosts really is what's responsible for the
> issue
> > we see. Not sure what else to test to verify or disprove this..
> >
> > On Mon, 25 Oct 2021 at 14:56, Dominic Humphries 
> wrote:
> >
> >> I think I found it!
> >>
> >> I didn't realise, but we have boost files for the core I'm testing and
> the
> >> boost is applied after replication! Setting the contents of the files to
> >> empty completely removes the post-replication performance problem we
> were
> >> seeing.
> >>
> >> So now my question becomes "Why is boosting taking so much longer for
> the
> >> upgrade?"
> >>
> >> Since the upgrade has its own independent set of data, I'm wondering if
> >> it's as simple as the IDs it's trying to boost don't exist and it takes
> >> longer to find out an item is missing than it does to find one that
> does? I
> >> believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems
> like
> >> the next logical step - if there's no performance hit when it has the
> same
> >> data as the 8.3.1 replica, then that's almost certainly the problem.
> >>
> >> Fingers crossed!
> >>
> >> On Sun, 24 Oct 2021 at 10:26, Deepak Goel  wrote:
> >>
> >>> There could be some testing and cooling happening post-replication.
> will
> >>> have to dig a bit more into the code.
> >>>
> >>> Deepak
> >>> "The greatness of a nation can be judged by the way its animals are
> >>> treated
> >>> - Mahatma Gandhi"
> >>>
> >>> +91 73500 12833
> >>> deic...@gmail.com
> >>>
> >>> Facebook: https://www.facebook.com/deicool
> >>> LinkedIn: www.linkedin.com/in/deicool
> >>>
> >>> "Plant a Tree, Go Green"
> >>>
> >>> Make In India : http://www.makeinindia.com/home
> >>>
> >>>
> >>> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
> >>>  wrote:
> >>>
>  One more tidbit: I just tried leaving replication off for a few hours
> >>> and
>  then triggering a "big" replication run so I could see the distinct
> >>> stages.
> 
>  - Beginning replication didn't cause any performance degradation.
>  - Several minutes of downloading the replication files saw no
>  degradation
>  - Only after downloading had completed did we start to see
> >>> performance
>  issues in our tests
>  - But we saw the "number of docs/timestamp of latest file" both
> jump
>  almost immediately after downloading completed and never move
> again
>  - But the performance degradation continued for about seven more
> >>> minutes
>  even though replication was clearly finished at this point
> 
> 
>  Is there some kind of re-indexing optimization thing that solr can run
>  post-replication? At this point it's about my only remaining suspect..
> 
>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> 
> Founding member of The Search Network 
> and co-author of Searching the Enterprise
> 
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>
> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> Amtsgericht Charlottenburg | HRB 230712 B
> Geschäftsführer: John M. Woodell | David E. Pugh
> Finanzamt: Berlin Finanzamt für Körperschaften II
>