Re: Edismax skips first part of phrase when q contains explicit field and parenthesis
Bumping this in case someone that has any idea missed it. On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas wrote: > Hi, > > I run solr 8.4.1 and I issue the following query on edismax parser: > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND* > > The parsed query edismax comes out with is the following: > +( > +( > +(+Title:word1 +Title_en:word2))) > (+(Title:\"for word2\")) > > Firstly, I expected the strange multiple MUST operators since I have read > they are added when using AND as a default op. Also, in the first main > clause the *for *term is missing correctly, since I have a stopword > filtering in my analysis chain. > However, what puzzles me is the fact that pf is skipping the first word > of my query. This won't happen if I was to add spaces after the opening and > before the closing parenthesis like that *Title:( word1 for word2 )*. > > I took a look at the code and found why it did this (it seems that pf > ignores the first part (*(word1*) because it ignores clauses assigned to > fields, inside > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and the > first part has Title as its field but the others do not), but I cannot > really understand the reasoning behind it. Is this to be expected or is > this a bug? > > I know that I could use the qf parameter to target the field directly, but > the above query could be extended to something like Title:(word1 for word2) > OR Abstract:(word3) which I do not know how to express it via qf. Also I > expected such syntax to work as an alternative in any case. > > Thanks, > Thomas >
Re: Edismax skips first part of phrase when q contains explicit field and parenthesis
> > query could be extended to something like Title:(word1 for word2) > OR Abstract:(word3) which I do not know how to express it via qf how would you like your pf to work with this? What is the final query you aim to? Probably in your case it would be better to fully go "custom" and write your query instead of realying on the pf parameter. I suspect pf was born in the dismax (where just free text query is supposed to be in the input) I doubt it is compatible at all with Lucene syntax in the main query (which is supported by the edismax). Cheers -- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Tue, 11 May 2021 at 10:28, Thomas Karampelas wrote: > Bumping this in case someone that has any idea missed it. > > On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas > > wrote: > > > Hi, > > > > I run solr 8.4.1 and I issue the following query on edismax parser: > > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND* > > > > The parsed query edismax comes out with is the following: > > +( > > +( > > +(+Title:word1 +Title_en:word2))) > > (+(Title:\"for word2\")) > > > > Firstly, I expected the strange multiple MUST operators since I have read > > they are added when using AND as a default op. Also, in the first main > > clause the *for *term is missing correctly, since I have a stopword > > filtering in my analysis chain. > > However, what puzzles me is the fact that pf is skipping the first word > > of my query. This won't happen if I was to add spaces after the opening > and > > before the closing parenthesis like that *Title:( word1 for word2 )*. > > > > I took a look at the code and found why it did this (it seems that pf > > ignores the first part (*(word1*) because it ignores clauses assigned to > > fields, inside > > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and > the > > first part has Title as its field but the others do not), but I cannot > > really understand the reasoning behind it. Is this to be expected or is > > this a bug? > > > > I know that I could use the qf parameter to target the field directly, > but > > the above query could be extended to something like Title:(word1 for > word2) > > OR Abstract:(word3) which I do not know how to express it via qf. Also I > > expected such syntax to work as an alternative in any case. > > > > Thanks, > > Thomas > > >
Re: text_en_splitting with quotes not matching when there are 2 adjacent stopwords
Hi Drini, I would recommend investigating the code a bit, that token filter is meant to flat multiple terms at the same position to make it super simple so It seems suspicious that merging two adjacent tokens putting generated incorrect positions is what happens. Have you checked the positionLength, position attributes of the tokens generated? Cheers -- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Thu, 6 May 2021 at 19:54, Drini Cami wrote: > Hello! I have a question about the text_en_splitting fieldType (solr 8.8.2, > very vanilla schema). I noticed that it was failing for queries like: > `title:"The > Mark of the Crown"`, but succeeding for queries like `title:The Mark of the > Crown`. Using the solr analysis tool, I noticed that the index analyzer > converts "The Mark of the Crown" to `[_, mark, _, crown]`, but the query > analyzer converts it to `[_, mark, _, _, crown]`. I then noticed the index > analyzer has as a final filter FlattenGraphFilterFactory, which seems to > combine adjacent `_`. I tried also adding FlattenGraphFilterFactory to the > query analyzer and that fixed the issue. Is this a reasonable solution? If > so, should that be the default? Or am I using the wrong fieldType > altogether? > > Thank you, > > Drini >
Re: Edismax skips first part of phrase when q contains explicit field and parenthesis
Thanks for the answer Alessandro. Well, I would expect it to extract the query text from the query (i.e. extracting it from the field definition) , take the word1 for word2 and add it add a phrase against the Title field. Essentially +( +( +(+Title:word1 +Title_en:word2))) (+(Title:\"word1 for word2\")) As I said, going through the code it seems that only the first word is tagged as belonging to the Title field. Then, to form the phrase query edis max omits everything that is tagged as belonging to a field, ending up skipping the first word . This is very puzzling and it looks buggy to me, but I might be missing something from the big picture. I can see your point regarding pf and lucene syntax being at odds, as pf originated with dismax, but since it is an integral feature of the edismax parser as well I expected it to work. Regarding creating the query manually, we do have a custom parser at the moment, but I was looking into migrating to edismax. Thanks. Thomas On Tue, May 11, 2021 at 1:44 PM Alessandro Benedetti wrote: > > > > query could be extended to something like Title:(word1 for word2) > > OR Abstract:(word3) which I do not know how to express it via qf > > > how would you like your pf to work with this? > What is the final query you aim to? > Probably in your case it would be better to fully go "custom" and write > your query instead of realying on the pf parameter. > > I suspect pf was born in the dismax (where just free text query is supposed > to be in the input) > I doubt it is compatible at all with Lucene syntax in the main query (which > is supported by the edismax). > > Cheers > -- > Alessandro Benedetti > Apache Lucene/Solr Committer > Director, R&D Software Engineer, Search Consultant > > www.sease.io > > > On Tue, 11 May 2021 at 10:28, Thomas Karampelas > wrote: > > > Bumping this in case someone that has any idea missed it. > > > > On Wed, Mar 31, 2021 at 11:14 AM Thomas Karampelas < > tkarampe...@atypon.com > > > > > wrote: > > > > > Hi, > > > > > > I run solr 8.4.1 and I issue the following query on edismax parser: > > > *defType=edismax&q=Title:(word1 for word2) &pf=Title&q.op=AND* > > > > > > The parsed query edismax comes out with is the following: > > > +( > > > +( > > > +(+Title:word1 +Title_en:word2))) > > > (+(Title:\"for word2\")) > > > > > > Firstly, I expected the strange multiple MUST operators since I have > read > > > they are added when using AND as a default op. Also, in the first main > > > clause the *for *term is missing correctly, since I have a stopword > > > filtering in my analysis chain. > > > However, what puzzles me is the fact that pf is skipping the first > word > > > of my query. This won't happen if I was to add spaces after the opening > > and > > > before the closing parenthesis like that *Title:( word1 for word2 )*. > > > > > > I took a look at the code and found why it did this (it seems that pf > > > ignores the first part (*(word1*) because it ignores clauses assigned > to > > > fields, inside > > > org.apache.solr.search.ExtendedDismaxQParser#addPhraseFieldQueries and > > the > > > first part has Title as its field but the others do not), but I cannot > > > really understand the reasoning behind it. Is this to be expected or is > > > this a bug? > > > > > > I know that I could use the qf parameter to target the field directly, > > but > > > the above query could be extended to something like Title:(word1 for > > word2) > > > OR Abstract:(word3) which I do not know how to express it via qf. Also > I > > > expected such syntax to work as an alternative in any case. > > > > > > Thanks, > > > Thomas > > > > > >
Re: Security: Better secure defaults?
Perhaps Solr should come up with a basic auth wrapper requiring a randomly generated token from the logs as a password printed at the very end of startup messages. This of course needs to show up in zookeeper too so that inter-node requests work. Nice if the UI at some point handles it, but as a temporary "until you set this up" type of feature, letting the browser throw up a 401 based login seems fine. This of course could be disabled either by a configuration in security.json or a system property named something like no.security.at.all >From a first tutorial perspective requests via the admin ui (or direct browser url) only get asked once per session, and sending a basic auth header is a very normal thing in curl. (and people who don't like typing don't use curl anyway). Things like Postman also handle this smoothly. Additionally it might be good to add a header to query responses something like: "insecure": [ "This cluster is running without https, communications with and among this cluster are easily spied upon by third parties. Configuring https removes this message", "This cluster is running with default log token basic auth. Anyone with access to the logs can gain full control of Solr. Configuring security.json with an authentication plugin removes this message" "This cluster is running such that every user is a super-user and can create/delete/update all collections and any data or configuration. Configuring an authorization plugin in security.json removes this message" ] possibly also messages about zookeeper acls, or whatever else we think is important. All such messages should be removable via properties like: "no.security.advice.all", "no.security.advice.https", "no.security.advice.authn", "no.security.advice.authz" etc. for backwards compatibility and dev/testing of course. This should ensure that the users (or at least one user in the organization) will be aware of their own insecure practices. -Gus On Fri, May 7, 2021 at 3:53 PM David Smiley wrote: > > I would like to be able to define core specific permissions with > rule-based > > authorization in security.json in the same way you can do for > collections. > > PRs/Patches welcome... but I think you're going to have to accept migrating > to SolrCloud. SolrCloud has gotten better year over year. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Fri, May 7, 2021 at 3:39 AM Thomas Corthals > wrote: > > > I would like to be able to define core specific permissions with > rule-based > > authorization in security.json in the same way you can do for > collections. > > > > Thomas > > > > Op do 6 mei 2021 om 23:25 schreef David Smiley : > > > > > I'm reaching out to our user community to get opinions on what Solr > > should > > > do to be more secure-by-default. > > > > > > TL;DR: Solr 9 has better secure-by-defaults, but maybe we should do > more > > > like have Solr pick some of it's default settings dependent on a new > > > env=dev|prod. > > > > > > I was shown a glimpse of a massive list of Solr servers exposed on the > > > public internet by a security researcher. I'm kinda blown away that so > > > many people would be so careless. I think Solr could and should run > with > > > better "secure-by-default" settings. > > > > > > The situation will be much better in Solr 9 -- and I'll give a > shout-out > > of > > > thanks to Rob Muir for helping make this so. Here's a couple prominent > > > ones: > > > * Solr's Jetty now binds to localhost by default, configurable via > > > SOLR_JETTY_HOST. Before 9, you can configure a similar thing in the > > Jetty > > > config files. SOLR-13985 > > > * Java's SecurityManager sandbox is enabled by default. -- SOLR-13984. > > > This option also exists in Solr since 8.5, toggle-able > > > via SOLR_SECURITY_MANAGER_ENABLED. Mostly this prevents the worst of > > > security bugs -- RCE. > > > > > > I wonder if users will promptly set SOLR_JETTY_HOST=0.0.0.0 to get > > anything > > > done? I think so... but it's something, protecting some users. > > > > > > Perhaps Solr ought to default to requiring a username/password? I've > > heard > > > this suggestion and it's an obvious one even if some of us (me > included) > > > worry that it would make it too annoying to play with Solr when getting > > > started. I think the concerns could be mitigated based on the > approach. > > > If Solr had an opt-in env=dev setting, for example, then Solr could not > > > insist on authentication, whereas a default env=prod would insist. Of > > > course the authentication or lack thereof could be explicitly > configured > > or > > > disabled at the user's prerogative. What I like about an "env" setting > > is > > > that many other settings could be gated on this as well. > > > > > > I particularly like the idea of an env=dev|prod setting because a > variety > > > of settings in Solr could have a default that is dependent on this > value. > > > I
Re: Permission "all" gets evaluated before more specific ones
Hi Jason, thank you for your reply. I'm sorry I didn't see it before, I was going to write the same answer that you posted. I checked the source code of the Authorization Plugin and the problem is the distinction between core and collections (in standalone mode and Solr cloud respectively). In fact, RuleBasedAuthorizationPlugin just checks for collections, which are not defined in Solr standalone mode. I think that I was wrong in saying that everything was working because I probably didn't check if I was denied to do some specific operations and I only checked what I was allowed to do (since before I was denied to do any operation). Thank you again for your support. Kind regards, Luca On 2021/05/10 17:06:25, Jason Gerlowski wrote: > Hi Luca,> > > Your permissions look correct, generally speaking. What version of Solr> > are you running?> > > There are some known problems using the RuleBasedAuthorizationPlugin in> > standalone mode - see https://issues.apache.org/jira/browse/SOLR-13097 for> > more details. Normally I would suspect that you're running into those, but> > it seems like you're saying that without the "all" permission then your> > other collection-specific permissions work just fine?> > > Best,> > > Jason> > > On Thu, Apr 29, 2021 at 2:34 PM Luca Fregolon wrote:> > > > Hello,> > > I am trying to configure Solr authentication using Basic> > > Authentication and Role Based Authorization. I've been facing issues> > > configuring the authorization part, while the authentication part> > > works fine. My goal is to define three groups, containing one user> > > each. One user (chatbot) should have read permission on all> > > collections and should be able to write on only one collection.> > > Another user should have read permissions on all the collections and> > > write permissions on all the collections but one, which is the one the> > > other user is allowed to write on.> > > Then there is a user (superadmin) that should be able to do everything.> > >> > > I am using Solr 8, in standalone mode.> > > I tried to write the following security.json file but every request> > > made by chatbot and console users gets rejected and the log points out> > > that superadmin is the only group allowed to perform the request.> > > If I delete the "all" rule, everything works as supposed to but I> > > cannot have a privileged user. This, in my opinion, seems not coherent> > > with what is written in the reference guide about the permission> > > priority (> > > https://solr.apache.org/guide/8_8/rule-based-authorization-plugin.html).> > > I did a lot of research before posting here but I didn't find any> > > solutions, so I would appreciate any help to sort it out.> > >> > > {> > > "authentication": {> > > "class": "solr.BasicAuthPlugin",> > > "blockUnknown": true,> > > "credentials": {> > > "superadmin-user":"...",> > > "chatbot-user":"...",> > > "console-user":"..."> > > }> > > },> > > "authorization": {> > > "class": "solr.RuleBasedAuthorizationPlugin",> > > "user-role": {> > > "chatbot-user": "chatbot",> > > "console-user": "console",> > > "superadmin-user": "superadmin"> > > },> > > "permissions": [> > > {"collection":["col1", "col2", "col3", "col4", "col5"],> > > "role":["chatbot","console"], "path":"/select"},> > > {"collection":"col5", "role":"chatbot", "path":"/update"},> > > {"collection":["col1", "col2", "col3", "col4"],> > > "role":"console", "path":"/update"},> > > {"name":"all", "role":"superadmin"}> > > ]> > > }> > > }> > >> > > Luca> > >> >