Re: RE: Index for text with space

2021-10-23 Thread son hoang
Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text can be 
separated into a token "Abbas" (and "Al"  but it is not counted as a token as 
it has 2 chars only) then we can apply OR condition in the query?  

On 2021/10/22 14:37:51, Andy C  wrote: 
> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
> field type. You have configured it with minGramSize="3" and have not
> specified preserveOriginal="true".
> 
> So words less than 3 characters will not be indexed, and therefore can't be
> searched.
> 
> See
> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
> 
> - Andy -
> 
> On Fri, Oct 22, 2021 at 10:12 AM son hoang  wrote:
> 
> > Thanks, Thamiz
> >
> > It seems that I have index=StandardTokenizerFactory causing the issue
> >
> > I do not want to re-index. Is there any solution ? Should I have query
> > "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
> > the query field  (eg: there is a OR match "Abbas" ?
> >
> > Thanks
> >
> > On 2021/10/21 07:56:20, Thamizhazhagan B 
> > wrote:
> > > Hi,
> > >
> > > Create a copy field as below and use this copyfield in your query..
> > >
> > > 
> > >> indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>
> > >
> > >  > sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
> > multiValued="false">
> > > 
> > >   
> > >> ignoreCase="true"/>
> > >   
> > > 
> > > 
> > >   
> > >> ignoreCase="true"/>
> > >> ignoreCase="true" synonyms="synonyms.txt"/>
> > >   
> > > 
> > >   
> > >
> > > Thanks,
> > > Thamizh
> > >
> > >
> > > -Original Message-
> > > From: son hoang 
> > > Sent: Thursday, October 21, 2021 8:19 AM
> > > To: users@solr.apache.org
> > > Subject: Index for text with space
> > >
> > > Caution: This email came from outside Kaiser Permanente. Do not open
> > attachments or click on links if you do not recognize the sender.
> > >
> > > __
> > > Hello
> > >
> > > I have a config like this:
> > >
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  > > maxGramSize="15"/>
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > > Using this config:
> > >
> > > 1. When I search for "Abbas", the result for "Al Abbas" appears.
> > >
> > > 2. When I search for "Al Abbas" in the search field, I get no results.
> > >
> > > It seems that "Al Abbas" is not indexed. What I should do in the config
> > so #2 can return the result
> > >
> > > Many thanks
> > > NOTICE TO RECIPIENT:  If you are not the intended recipient of this
> > e-mail, you are prohibited from sharing, copying, or otherwise using or
> > disclosing its contents.  If you have received this e-mail in error, please
> > notify the sender immediately by reply e-mail and permanently delete this
> > e-mail and any attachments without reading, forwarding or saving them.
> > v.173.295  Thank you.
> > >
> >
> 


Re: Index for text with space

2021-10-23 Thread Dave
Why ever would you not index less than three characters?
“To be or not to be”
Seems like a significant search 

> On Oct 23, 2021, at 7:28 AM, son hoang  wrote:
> 
> Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text can 
> be separated into a token "Abbas" (and "Al"  but it is not counted as a token 
> as it has 2 chars only) then we can apply OR condition in the query?  
> 
>> On 2021/10/22 14:37:51, Andy C  wrote: 
>> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
>> field type. You have configured it with minGramSize="3" and have not
>> specified preserveOriginal="true".
>> 
>> So words less than 3 characters will not be indexed, and therefore can't be
>> searched.
>> 
>> See
>> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
>> 
>> - Andy -
>> 
>>> On Fri, Oct 22, 2021 at 10:12 AM son hoang  wrote:
>>> 
>>> Thanks, Thamiz
>>> 
>>> It seems that I have index=StandardTokenizerFactory causing the issue
>>> 
>>> I do not want to re-index. Is there any solution ? Should I have query
>>> "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
>>> the query field  (eg: there is a OR match "Abbas" ?
>>> 
>>> Thanks
>>> 
>>> On 2021/10/21 07:56:20, Thamizhazhagan B 
>>> wrote:
 Hi,
 
 Create a copy field as below and use this copyfield in your query..
 
 
  >> indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>
 
 >> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
>>> multiValued="false">

  
  >> ignoreCase="true"/>
  


  
  >> ignoreCase="true"/>
  >> ignoreCase="true" synonyms="synonyms.txt"/>
  

  
 
 Thanks,
 Thamizh
 
 
 -Original Message-
 From: son hoang 
 Sent: Thursday, October 21, 2021 8:19 AM
 To: users@solr.apache.org
 Subject: Index for text with space
 
 Caution: This email came from outside Kaiser Permanente. Do not open
>>> attachments or click on links if you do not recognize the sender.
 
 __
 Hello
 
 I have a config like this:
 
 




>>> maxGramSize="15"/>








 
 Using this config:
 
 1. When I search for "Abbas", the result for "Al Abbas" appears.
 
 2. When I search for "Al Abbas" in the search field, I get no results.
 
 It seems that "Al Abbas" is not indexed. What I should do in the config
>>> so #2 can return the result
 
 Many thanks
 NOTICE TO RECIPIENT:  If you are not the intended recipient of this
>>> e-mail, you are prohibited from sharing, copying, or otherwise using or
>>> disclosing its contents.  If you have received this e-mail in error, please
>>> notify the sender immediately by reply e-mail and permanently delete this
>>> e-mail and any attachments without reading, forwarding or saving them.
>>> v.173.295  Thank you.
 
>>> 
>> 


Re: Index for text with space

2021-10-23 Thread Walter Underwood
Agreed. There is a simple fix. Index all the words. Also, stop using 
EdgeNgramFilter.
That is only used for completion, not word search.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 23, 2021, at 4:31 AM, Dave  wrote:
> 
> Why ever would you not index less than three characters?
> “To be or not to be”
> Seems like a significant search 
> 
>> On Oct 23, 2021, at 7:28 AM, son hoang  wrote:
>> 
>> Yep, words less than 3 chars will not be indexed. But if "Al Abbas" text 
>> can be separated into a token "Abbas" (and "Al"  but it is not counted as a 
>> token as it has 2 chars only) then we can apply OR condition in the query?  
>> 
>>> On 2021/10/22 14:37:51, Andy C  wrote: 
>>> The issue looks to me to be with the use of EdgeNGramFilterFactory in your
>>> field type. You have configured it with minGramSize="3" and have not
>>> specified preserveOriginal="true".
>>> 
>>> So words less than 3 characters will not be indexed, and therefore can't be
>>> searched.
>>> 
>>> See
>>> https://solr.apache.org/guide/8_8/filter-descriptions.html#edge-n-gram-filter
>>> 
>>> - Andy -
>>> 
 On Fri, Oct 22, 2021 at 10:12 AM son hoang  wrote:
 
 Thanks, Thamiz
 
 It seems that I have index=StandardTokenizerFactory causing the issue
 
 I do not want to re-index. Is there any solution ? Should I have query
 "OR" so that the search can return  "Al Abbas" when I have  "Al Abbas" in
 the query field  (eg: there is a OR match "Abbas" ?
 
 Thanks
 
 On 2021/10/21 07:56:20, Thamizhazhagan B 
 wrote:
> Hi,
> 
> Create a copy field as below and use this copyfield in your query..
> 
> 
> >>> indexed="true" termVectors="true" termPositions="true" termOffsets="true"/>
> 
> >>> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"
 multiValued="false">
>   
> 
> >>> ignoreCase="true"/>
> 
>   
>   
> 
> >>> ignoreCase="true"/>
> >>> ignoreCase="true" synonyms="synonyms.txt"/>
> 
>   
> 
> 
> Thanks,
> Thamizh
> 
> 
> -Original Message-
> From: son hoang 
> Sent: Thursday, October 21, 2021 8:19 AM
> To: users@solr.apache.org
> Subject: Index for text with space
> 
> Caution: This email came from outside Kaiser Permanente. Do not open
 attachments or click on links if you do not recognize the sender.
> 
> __
> Hello
> 
> I have a config like this:
> 
> 
>   
>   
>   
>   
>    maxGramSize="15"/>
>   
>   
>   
>   
>   
>   
>   
>   
> 
> Using this config:
> 
> 1. When I search for "Abbas", the result for "Al Abbas" appears.
> 
> 2. When I search for "Al Abbas" in the search field, I get no results.
> 
> It seems that "Al Abbas" is not indexed. What I should do in the config
 so #2 can return the result
> 
> Many thanks
> NOTICE TO RECIPIENT:  If you are not the intended recipient of this
 e-mail, you are prohibited from sharing, copying, or otherwise using or
 disclosing its contents.  If you have received this e-mail in error, please
 notify the sender immediately by reply e-mail and permanently delete this
 e-mail and any attachments without reading, forwarding or saving them.
 v.173.295  Thank you.
> 
 
>>> 



Any downsides to using RAFDirectory instead of SimpleFSDirectory ?

2021-10-23 Thread Rahul Goswami
Hello,
I posted this question on the java-lucene group but not sure if my email
was noticed/made it through. Hence posting here. Apologies if this is
against the guidelines/best practices.

I am running Solr on a Windows server and mmap doesn't quite work well
there, so I have been using SimpleFS.

It was working well for the most part, but we recently encountered
ClosedChannelException due to the file descriptor getting closed during IO.
>From the java doc of SimpleFSDirectory...

https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/store/SimpleFSDirectory.html
*NOTE:* Accessing this class either directly or indirectly from a thread
while it's interrupted can close the underlying file descriptor immediately
if at the same time the thread is blocked on IO. The file descriptor will
remain closed and subsequent access to SimpleFSDirectory

will
throw a ClosedChannelException
.
If your application uses either Thread.interrupt()

 or Future.cancel(boolean)

you
should use the legacy RAFDirectory from the Lucene misc module in favor of
SimpleFSDirectory

.


We eventually found the culprit interrupting call, but it led me to
thinking if RAFDirectory is in general a worthy replacement for SimpleFS.
Are there any downsides to RAFDirectory (vs SimpleFS) due to which it is
considered legacy?

Thanks.,
Rahul