Re: Edismax skips first part of phrase when q contains explicit field and parenthesis

2021-05-11 Thread Thomas Karampelas
Bumping this in case someone that has any idea missed it.

On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas 
wrote:

> Hi,
>
> I run solr 8.4.1 and I issue the following query on edismax parser:
> *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*
>
> The parsed query edismax comes out with is the following:
> +(
>  +(
> +(+Title:word1 +Title_en:word2)))
> (+(Title:\"for word2\"))
>
> Firstly, I expected the strange multiple MUST operators since I have read
> they are added when using AND as a default op. Also, in the first main
> clause the *for *term is missing correctly, since I have a stopword
> filtering in my analysis chain.
>  However, what puzzles me is the fact that pf is skipping the first word
> of my query. This won't happen if I was to add spaces after the opening and
> before the closing parenthesis like that *Title:( word1 for word2 )*.
>
> I took a look at the code and found why it did this (it seems that pf
> ignores the first part (*(word1*) because it ignores clauses assigned to
> fields, inside
> org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and the
> first part has Title as its field but the others do not), but I cannot
> really understand the reasoning behind it. Is this to be expected or is
> this a bug?
>
> I know that I could use the qf parameter to target the field directly, but
> the above query could be extended to something like Title:(word1 for word2)
> OR Abstract:(word3) which I do not know how to express it via qf. Also I
> expected such syntax to work as an alternative in any case.
>
> Thanks,
> Thomas
>


Re: Edismax skips first part of phrase when q contains explicit field and parenthesis

2021-05-11 Thread Alessandro Benedetti
>
> query could be extended to something like Title:(word1 for word2)
> OR Abstract:(word3) which I do not know how to express it via qf


how would you like your pf to work with this?
What is the final query you aim to?
Probably in your case it would be better to fully go "custom" and write
your query instead of realying on the pf parameter.

I suspect pf was born in the dismax (where just free text query is supposed
to be in the input)
I doubt it is compatible at all with Lucene syntax in the main query (which
is supported by the edismax).

Cheers
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Tue, 11 May 2021 at 10:28, Thomas Karampelas 
wrote:

> Bumping this in case someone that has any idea missed it.
>
> On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas  >
> wrote:
>
> > Hi,
> >
> > I run solr 8.4.1 and I issue the following query on edismax parser:
> > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*
> >
> > The parsed query edismax comes out with is the following:
> > +(
> >  +(
> > +(+Title:word1 +Title_en:word2)))
> > (+(Title:\"for word2\"))
> >
> > Firstly, I expected the strange multiple MUST operators since I have read
> > they are added when using AND as a default op. Also, in the first main
> > clause the *for *term is missing correctly, since I have a stopword
> > filtering in my analysis chain.
> >  However, what puzzles me is the fact that pf is skipping the first word
> > of my query. This won't happen if I was to add spaces after the opening
> and
> > before the closing parenthesis like that *Title:( word1 for word2 )*.
> >
> > I took a look at the code and found why it did this (it seems that pf
> > ignores the first part (*(word1*) because it ignores clauses assigned to
> > fields, inside
> > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and
> the
> > first part has Title as its field but the others do not), but I cannot
> > really understand the reasoning behind it. Is this to be expected or is
> > this a bug?
> >
> > I know that I could use the qf parameter to target the field directly,
> but
> > the above query could be extended to something like Title:(word1 for
> word2)
> > OR Abstract:(word3) which I do not know how to express it via qf. Also I
> > expected such syntax to work as an alternative in any case.
> >
> > Thanks,
> > Thomas
> >
>


Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords

2021-05-11 Thread Alessandro Benedetti
Hi Drini,
I would recommend investigating the code a bit, that token filter is meant
to flat multiple terms at the same position to make it super simple so It
seems suspicious that merging two adjacent tokens putting generated
incorrect positions is what happens.
Have you checked the positionLength, position attributes of the tokens
generated?

Cheers
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Thu, 6 May 2021 at 19:54, Drini Cami  wrote:

> Hello! I have a question about the text_en_splitting fieldType (solr 8.8.2,
> very vanilla schema). I noticed that it was failing for queries like:
> `title:"The
> Mark of the Crown"`, but succeeding for queries like `title:The Mark of the
> Crown`. Using the solr analysis tool, I noticed that the index analyzer
> converts "The Mark of the Crown" to `[_, mark, _, crown]`, but the query
> analyzer converts it to `[_, mark, _, _, crown]`. I then noticed the index
> analyzer has as a final filter FlattenGraphFilterFactory, which seems to
> combine adjacent `_`. I tried also adding FlattenGraphFilterFactory to the
> query analyzer and that fixed the issue. Is this a reasonable solution? If
> so, should that be the default? Or am I using the wrong fieldType
> altogether?
>
> Thank you,
>
> Drini
>


Re: Edismax skips first part of phrase when q contains explicit field and parenthesis

2021-05-11 Thread Thomas Karampelas
Thanks for the answer Alessandro.

Well, I would expect it to extract the query text from the query (i.e.
extracting it from the field definition) , take the word1 for word2 and add
it add a phrase against the Title field. Essentially

 +(
  +(
 +(+Title:word1 +Title_en:word2)))
 (+(Title:\"word1 for word2\"))

As I said, going through the code it seems that only the first word is
tagged as belonging to the Title field. Then, to form the phrase query edis
max omits everything that is tagged as belonging to a field, ending up
skipping the first word . This is very puzzling and it looks buggy to me,
but I might be missing something from the big picture.

I can see your point regarding pf and lucene syntax being at odds, as pf
originated with dismax, but since it is an integral feature of the edismax
parser as well I expected it to work.

Regarding creating the query manually, we do have a custom parser at the
moment, but I was looking into migrating to edismax.

Thanks.
Thomas

On Tue, May 11, 2021 at 1:44 PM Alessandro Benedetti 
wrote:

> >
> > query could be extended to something like Title:(word1 for word2)
> > OR Abstract:(word3) which I do not know how to express it via qf
>
>
> how would you like your pf to work with this?
> What is the final query you aim to?
> Probably in your case it would be better to fully go "custom" and write
> your query instead of realying on the pf parameter.
>
> I suspect pf was born in the dismax (where just free text query is supposed
> to be in the input)
> I doubt it is compatible at all with Lucene syntax in the main query (which
> is supported by the edismax).
>
> Cheers
> --
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Tue, 11 May 2021 at 10:28, Thomas Karampelas 
> wrote:
>
> > Bumping this in case someone that has any idea missed it.
> >
> > On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas <
> tkarampe...@atypon.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I run solr 8.4.1 and I issue the following query on edismax parser:
> > > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND*
> > >
> > > The parsed query edismax comes out with is the following:
> > > +(
> > >  +(
> > > +(+Title:word1 +Title_en:word2)))
> > > (+(Title:\"for word2\"))
> > >
> > > Firstly, I expected the strange multiple MUST operators since I have
> read
> > > they are added when using AND as a default op. Also, in the first main
> > > clause the *for *term is missing correctly, since I have a stopword
> > > filtering in my analysis chain.
> > >  However, what puzzles me is the fact that pf is skipping the first
> word
> > > of my query. This won't happen if I was to add spaces after the opening
> > and
> > > before the closing parenthesis like that *Title:( word1 for word2 )*.
> > >
> > > I took a look at the code and found why it did this (it seems that pf
> > > ignores the first part (*(word1*) because it ignores clauses assigned
> to
> > > fields, inside
> > > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and
> > the
> > > first part has Title as its field but the others do not), but I cannot
> > > really understand the reasoning behind it. Is this to be expected or is
> > > this a bug?
> > >
> > > I know that I could use the qf parameter to target the field directly,
> > but
> > > the above query could be extended to something like Title:(word1 for
> > word2)
> > > OR Abstract:(word3) which I do not know how to express it via qf. Also
> I
> > > expected such syntax to work as an alternative in any case.
> > >
> > > Thanks,
> > > Thomas
> > >
> >
>


Re: Security: Better secure defaults?

2021-05-11 Thread Gus Heck
Perhaps Solr should come up with a basic auth wrapper requiring a randomly
generated token from the logs as a password printed at the very end of
startup messages. This of course needs to show up in zookeeper too so that
inter-node requests work. Nice if the UI at some point handles it, but as a
temporary "until you set this up" type of feature, letting the browser
throw up a 401 based login seems fine. This of course could be disabled
either by a configuration in security.json or a system property named
something like no.security.at.all

>From a first tutorial perspective requests via the admin ui (or direct
browser url) only get asked once per session, and sending a basic auth
header is a very normal thing in curl. (and people who don't like typing
don't use curl anyway). Things like Postman also handle this smoothly.

Additionally it might be good to add a header to query responses something
like:

 "insecure": [
"This cluster is running without https, communications with and among
this cluster are easily spied upon by third parties. Configuring https
removes this message",
"This cluster is running with default log token basic auth. Anyone with
access to the logs can gain full control of Solr. Configuring security.json
with an authentication plugin removes this message"
"This cluster is running such that every user is a super-user and can
create/delete/update all collections and any data or configuration.
Configuring an authorization plugin in security.json removes this message"
]

possibly also messages about zookeeper acls, or whatever else we think is
important.

All such messages should be removable via properties like:
"no.security.advice.all", "no.security.advice.https",
"no.security.advice.authn", "no.security.advice.authz" etc. for backwards
compatibility and dev/testing of course.

This should ensure that the users (or at least one user in the
organization) will be aware of their own insecure practices.

-Gus


On Fri, May 7, 2021 at 3:53 PM David Smiley  wrote:

> > I would like to be able to define core specific permissions with
> rule-based
> > authorization in security.json in the same way you can do for
> collections.
>
> PRs/Patches welcome... but I think you're going to have to accept migrating
> to SolrCloud.  SolrCloud has gotten better year over year.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, May 7, 2021 at 3:39 AM Thomas Corthals 
> wrote:
>
> > I would like to be able to define core specific permissions with
> rule-based
> > authorization in security.json in the same way you can do for
> collections.
> >
> > Thomas
> >
> > Op do 6 mei 2021 om 23:25 schreef David Smiley :
> >
> > > I'm reaching out to our user community to get opinions on what Solr
> > should
> > > do to be more secure-by-default.
> > >
> > > TL;DR: Solr 9 has better secure-by-defaults, but maybe we should do
> more
> > > like have Solr pick some of it's default settings dependent on a new
> > > env=dev|prod.
> > >
> > > I was shown a glimpse of a massive list of Solr servers exposed on the
> > > public internet by a security researcher.  I'm kinda blown away that so
> > > many people would be so careless.  I think Solr could and should run
> with
> > > better "secure-by-default" settings.
> > >
> > > The situation will be much better in Solr 9 -- and I'll give a
> shout-out
> > of
> > > thanks to Rob Muir for helping make this so.  Here's a couple prominent
> > > ones:
> > > * Solr's Jetty now binds to localhost by default, configurable via
> > > SOLR_JETTY_HOST.  Before 9, you can configure a similar thing in the
> > Jetty
> > > config files.  SOLR-13985
> > > * Java's SecurityManager sandbox is enabled by default. -- SOLR-13984.
> > > This option also exists in Solr since 8.5, toggle-able
> > > via SOLR_SECURITY_MANAGER_ENABLED.  Mostly this prevents the worst of
> > > security bugs -- RCE.
> > >
> > > I wonder if users will promptly set SOLR_JETTY_HOST=0.0.0.0 to get
> > anything
> > > done?  I think so... but it's something, protecting some users.
> > >
> > > Perhaps Solr ought to default to requiring a username/password?  I've
> > heard
> > > this suggestion and it's an obvious one even if some of us (me
> included)
> > > worry that it would make it too annoying to play with Solr when getting
> > > started.  I think the concerns could be mitigated based on the
> approach.
> > > If Solr had an opt-in env=dev setting, for example, then Solr could not
> > > insist on authentication, whereas a default env=prod would insist.  Of
> > > course the authentication or lack thereof could be explicitly
> configured
> > or
> > > disabled at the user's prerogative.  What I like about an "env" setting
> > is
> > > that many other settings could be gated on this as well.
> > >
> > > I particularly like the idea of an env=dev|prod setting because a
> variety
> > > of settings in Solr could have a default that is dependent on this
> value.
> > > I

Re: Permission "all" gets evaluated before more specific ones

2021-05-11 Thread Luca Fregolon
Hi Jason,
thank you for your reply.
I'm sorry I didn't see it before, I was going to write the same answer
that you posted.
I checked the source code of the Authorization Plugin and the problem
is the distinction between core and collections (in standalone mode
and Solr cloud respectively).
In fact, RuleBasedAuthorizationPlugin just checks for collections,
which are not defined in Solr standalone mode.
I think that I was wrong in saying that everything was working because
I probably didn't check if I was denied to do some specific operations
and I only checked what I was allowed to do (since before I was denied
to do any operation).
Thank you again for your support.
Kind regards,
Luca

On 2021/05/10 17:06:25, Jason Gerlowski  wrote:
> Hi Luca,>
>
> Your permissions look correct, generally speaking.  What version of Solr>
> are you running?>
>
> There are some known problems using the RuleBasedAuthorizationPlugin in>
> standalone mode - see https://issues.apache.org/jira/browse/SOLR-13097 for>
> more details.  Normally I would suspect that you're running into those, but>
> it seems like you're saying that without the "all" permission then your>
> other collection-specific permissions work just fine?>
>
> Best,>
>
> Jason>
>
> On Thu, Apr 29, 2021 at 2:34 PM Luca Fregolon  wrote:>
>
> > Hello,>
> > I am trying to configure Solr authentication using Basic>
> > Authentication and Role Based Authorization. I've been facing issues>
> > configuring the authorization part, while the authentication part>
> > works fine. My goal is to define three groups, containing one user>
> > each. One user (chatbot) should have read permission on all>
> > collections and should be able to write on only one collection.>
> > Another user should have read permissions on all the collections and>
> > write permissions on all the collections but one, which is the one the>
> > other user is allowed to write on.>
> > Then there is a user (superadmin) that should be able to do everything.>
> >>
> > I am using Solr 8, in standalone mode.>
> > I tried to write the following security.json file but every request>
> > made by chatbot and console users gets rejected and the log points out>
> > that superadmin is the only group allowed to perform the request.>
> > If I delete the "all" rule, everything works as supposed to but I>
> > cannot have a privileged user. This, in my opinion, seems not coherent>
> > with what is written in the reference guide about the permission>
> > priority (>
> > https://solr.apache.org/guide/8_8/rule-based-authorization-plugin.html).>
> > I did a lot of research before posting here but I didn't find any>
> > solutions, so I would appreciate any help to sort it out.>
> >>
> > {>
> >   "authentication": {>
> > "class": "solr.BasicAuthPlugin",>
> > "blockUnknown": true,>
> > "credentials": {>
> >   "superadmin-user":"...",>
> >   "chatbot-user":"...",>
> >   "console-user":"...">
> > }>
> >   },>
> >   "authorization": {>
> > "class": "solr.RuleBasedAuthorizationPlugin",>
> > "user-role": {>
> >   "chatbot-user": "chatbot",>
> >   "console-user": "console",>
> >   "superadmin-user": "superadmin">
> > },>
> > "permissions": [>
> >   {"collection":["col1", "col2", "col3", "col4", "col5"],>
> > "role":["chatbot","console"], "path":"/select"},>
> >   {"collection":"col5", "role":"chatbot", "path":"/update"},>
> >   {"collection":["col1", "col2", "col3", "col4"],>
> > "role":"console", "path":"/update"},>
> >   {"name":"all", "role":"superadmin"}>
> > ]>
> >   }>
> > }>
> >>
> > Luca>
> >>
>