Re: AWS S3 bucket backup issue

2022-02-18 Thread Houston Putman
I've opened a PR to make the documentation much clearer for this:

https://github.com/apache/solr/pull/652

Please leave any notes if you think it can be improved!

- Houston

On Tue, Feb 8, 2022 at 12:18 PM Eric Charles  wrote:

> I was not pre-creating the folder in the S3 bucket in my previous tests.
> When I create that `temp1` folder (without `/` at the start), the following
> 4 variants Houston just pasted work fine
>
> I think the confusion (and I have been fooled again just now) is that the
> location=s3://temp1 does not contain the bucket name (it is just s3 prefix
> and the folder)
>
> On 2022/02/07 17:59:09 Houston Putman wrote:
> > Sorry for the back and forth everyone, I'm trying to understand the
> issue,
> > but I absolutely cannot replicate this locally.
> >
> > In the S3 console, I create a directory named "temp1/" *(Please note,
> this
> > is not the same as "/temp1/", the actual folder in the bucket CANNOT
> start
> > with a "/")* under the bucket that I am using.
> > When I use the following Solr APIs, they ALL work for me:
> >
> >
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=test-s3-test&repository=default&collection=test&location=temp1
> >
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=test-s3-test&repository=default&collection=test&location=/temp1
> >
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=test-s3-test&repository=default&collection=test&location=s3://temp1
> >
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=test-s3-test&repository=default&collection=test&location=s3:///temp1
> >
> > The only possible reason I can think of for so many people to be seeing
> > errors is that they are pre-creating directories in S3 that have "/"
> > prepended to the name.
> > This is already documented in the ref guide
> > <
> https://solr.apache.org/guide/8_11/making-and-restoring-backups.html#s3backuprepository
> >,
> > but we can think of verbiage to make it more clear.
> >
> > I understand that we can make it so that "location" is not a required
> > property and default it to "/", but that doesn't make sense for all
> backup
> > repository types, such as a fileSystem backup.
> >
> > - Houston
> >
> > On Mon, Feb 7, 2022 at 8:54 AM Michael Conrad 
> wrote:
> >
> > > The restore process allows restoring to a collection with a different
> > > name than the original. So I would expect the requirement to have a
> > > collection already existing to be legit.
> > >
> > > The documentation does say (or at least implies) that to restore the
> > > number of shards in the destination collection has to match the backup.
> > >
> > > On 2/5/22 12:02, Eric Charles wrote:
> > > > ... and thx Houston and other participants to this thread.
> > > >
> > > > I have just restored the backup, and it works fine. The only point is
> > > that I had to create the collection before restoring (I had dropped
> it). I
> > > wonder if I have to recreate the collection with the same number of
> > > shards... ?
> > > >
> > > > On 2022/02/05 16:55:26 Eric Charles wrote:
> > > >> I have faced similar questions to get it working. I have first
> opened
> > > an issue on the solr-operatorhttps://
> > > github.com/apache/solr-operator/issues/404  and then was able to
> > > replicate the issue on my laptop, so came to the conclusion the issue
> was
> > > at solr level, and not solr-operator.
> > > >>
> > > >> Without Sergio reply, I would not have been able to make sense of
> the
> > > parameters and get is working. Thx!
> > > >>
> > > >> On 2022/01/27 14:01:46 Sergio GarcĂ­a Maroto wrote:
> > > >>> Hi,
> > > >>>
> > > >>> Correct. Using "location=backupfolder"  didn't work for me.
> > > >>>
> > > >>> The only way I made it work is with location=s3:/
> > > >>> Below my sample url which works well.
> > > >>>
> > >
> http://servername:8983/solr/admin/collections?action=BACKUP&name=personbackup&collection=person&repository=s3&location=s3:/
> > > >>>
> > > >>>
> > > >>> On Tue, 25 Jan 2022 at 17:23, Houston Putman
> > > wrote:
> > > >>>
> > >  Thanks for all of the information everybody. I want to determine
> if
> > > this is
> > >  actually a bug before we release 9.0
> > > 
> > >  First, I want to clear up the usage of the "location" parameter:
> > > 
> > >  - It is required, but you can provide "/" as an "empty"
> > > directory, much
> > >  like "s3:/".
> > >  - You don't have to include "s3:/" or "s3://". You can you
> "/dir",
> > >  "dir", "s3:/dir" or "s3://dir". All of these options will
> > > eventually be
> > >  converted to the "dir/" directory in your bucket.
> > >  - The s3 repository does not allow for directory names
> starting
> > > with
> > >  "/". In general this is to allow all of the above ^ examples
> to
> > > compute
> > >  to
> > >  the same thing without users being confused how many '/'s they
> > > need
> > >  after
> > >  "s3:". Now I see 

unexpected wildcard results

2022-02-18 Thread Matthew Roth
Hi List,

We are noting unexpected wildcard results. For example, the following query

 text:"witch*"

will match witch, witches, but not witchcraft. We would
anticipate witchcraft would also be matched. I suspect the issue may lie
with the field definition.

 
  

  




  
  




  

 
  



May anyone offer any insight.

Best,
Matt


Re: unexpected wildcard results

2022-02-18 Thread Joel Bernstein
This is a great tool for understanding how analyzers are handling specific
terms: https://solr.apache.org/guide/8_8/analysis-screen.html

You'll be able to see how witchcraft was added to the index.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 18, 2022 at 3:47 PM Matthew Roth  wrote:

> Hi List,
>
> We are noting unexpected wildcard results. For example, the following query
>
>  text:"witch*"
>
> will match witch, witches, but not witchcraft. We would
> anticipate witchcraft would also be matched. I suspect the issue may lie
> with the field definition.
>
>   positionIncrementGap="100">
>   
> 
>ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
>   
>  protected="protwords.txt"/>
>  
>   
> 
>
>
> May anyone offer any insight.
>
> Best,
> Matt
>


Re: unexpected wildcard results

2022-02-18 Thread Andy C
I think the issue is the doublequotes around your query string. Try
searching for text:witch* instead.

It appears that when surrounded by doublequotes the * is treated as text
and not a syntax character (wildcard). You can verify this by performing
the query in the Solr Admin UI and checking the "debugQuery" box.

This results in it actually searching just for 'witch' as * characters are
not indexed by your field type.

The Porter stemmer indexes both 'witch' and 'witches' as 'witch' but
'witchcraft' as 'witchcraft'. So when it searches for 'witch' it matches
the original text of 'witch' and 'witches' but not 'witchcraft'

- Andy -

On Fri, Feb 18, 2022 at 4:48 PM Joel Bernstein  wrote:

> This is a great tool for understanding how analyzers are handling specific
> terms: https://solr.apache.org/guide/8_8/analysis-screen.html
>
> You'll be able to see how witchcraft was added to the index.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Feb 18, 2022 at 3:47 PM Matthew Roth  wrote:
>
> > Hi List,
> >
> > We are noting unexpected wildcard results. For example, the following
> query
> >
> >  text:"witch*"
> >
> > will match witch, witches, but not witchcraft. We would
> > anticipate witchcraft would also be matched. I suspect the issue may lie
> > with the field definition.
> >
> >   > positionIncrementGap="100">
> >   
> > 
> >> ignoreCase="true"
> > words="lang/stopwords_en.txt"
> > />
> > 
> > 
> >  > protected="protwords.txt"/>
> > 
> >   
> >   
> > 
> >  > ignoreCase="true" expand="true"/>
> >  > ignoreCase="true"
> > words="lang/stopwords_en.txt"
> > />
> > 
> >   
> >  > protected="protwords.txt"/>
> >  
> >   
> > 
> >
> >
> > May anyone offer any insight.
> >
> > Best,
> > Matt
> >
>