Re: Rule-Based Authorization Plugins with JWT Authentication Plugin

2022-04-05 Thread Dominique Bejean
Hi,

I reply to myself.
I had to go in debug mode in order to see that Set userRoles =
getUserRoles(principal); was returning null due to the fact the username
attribute in the principal was the keycloak user uuid (the default sub
attribute value) and not the preferred_username.

I fixed this in security.json, by adding "principalClaim":
"preferred_username" in the authentication settings.

{
  "authentication":{
"blockUnknown": true,
"class":"solr.JWTAuthPlugin",
"principalClaim": "preferred_username
...

Dominique






Le sam. 2 avr. 2022 à 15:15, Dominique Bejean  a
écrit :

> Hi,
>
> I am testing JWT Authentication Plugin with a keycloak IPD.
>
>
> Rule-Based Authorization Plugins deny access to some Solr end points.
> According to logs, I don't understand why.
>
> For example, in this log line we see that
> requested role are either "solr", "solr:admin", "consoleread"
> user roles are [profile, solr, admin, email]
>
> The determineIfPermissionPermitsPrincipal method in
> RuleBasedAuthorizationPluginBase.java should allow access.
>
> Set userRoles = getUserRoles(principal);
> for (String role : governingPermission.role) {
>   if (userRoles != null && userRoles.contains(role)) {
> log.debug("Governing permission [{}] allows access to role [{}];
> permitting access", governingPermission, role);
> return MatchStatus.PERMITTED;
>   }
> }
> log.info("This resource is configured to have a permission {}, The
> principal {} does not have the right role ", governingPermission,
> principal);
> return MatchStatus.FORBIDDEN;
>
> governingPermission.role = "solr", "solr:admin", "consoleread"
> userRoles = profile, solr, admin, email
>
> "solr" role should match !?
>
>
> 2022-04-02 12:33:12.693 INFO  (qtp1406253491-27) [   ]
> o.a.s.s.RuleBasedAuthorizationPluginBase This resource is configured to
> have a permission {
>   "collection":null,
>   "path":[
> "/admin/info",
> "/admin/info/threads",
> "/admin/info/system",
> "/admin/info/properties"],
>   "role":[
> "solr",
> "solr:admin",
> "consoleread"],
>   "index":7}, The principal
> JWTPrincipalWithUserRoles{username='89c48f69-eead-44ee-bf37-741e2d661b77',
> token='*', claims={exp=1648903285, iat=1648902385,
> auth_time=1648902256, jti=1838b4df-801a-410e-903a-f4b9ced085fc, iss=
> https://localhost:8443/auth/realms/solr,
> sub=89c48f69-eead-44ee-bf37-741e2d661b77, typ=Bearer, azp=solr-app,
> nonce=25kzdeinhb9dqflogqy4id8a8zmiyvpkg,
> session_state=34ed4051-8cc1-45da-a25e-67054087af34, acr=0,
> realm_access={roles=[solr:admin]},
> resource_access={solr-app={roles=[solr:admin]}}, scope=openid profile solr
> admin email, sid=34ed4051-8cc1-45da-a25e-67054087af34, email_verified=true,
> preferred_username=solradmin, given_name=, family_name=}, roles=[profile,
> solr, admin, email]} does not have the right role
>
> Any suggestions ?
>
> Regards
>
> Dominique
>
>


Re: Help with stopwords filter

2022-04-05 Thread Arif Shaon
Hi Dominique,

Thanks for replying to my query.
"is" is defined as a stopword for the _en (English) fields but _ar (Arabic)
uses a custom analyser which is not clear about how it deals with stopwords.
I will look into it.

Thanks again.
Best
Arif

On Mon, Apr 4, 2022 at 3:42 PM Dominique Bejean 
wrote:

> Hi,
>
> Are you sure "is" is defined as a stopword at both index and query type in
> your analyzers ?
>
> Dominique
>
> Le lun. 4 avr. 2022 à 09:09, Arif Shaon  a écrit :
>
> > Hello list,
> >
> > I am trying the following two queries, which should return the same
> result.
> > However, the first contains a stop word "is" and as a result its
> returning
> > 0 result. So it seems to me that the stopword filter is not working as
> > expected. Could someone please look at the debug reports of the two
> queries
> > and advise what I am doing wrong?  Any help would be appreciated.
> >
> > Query 1:
> >
> > "rawquerystring":"thim day is gone",
> >   "querystring":"thim day is gone",
> >   "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0
> |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
> > | (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01)
> > +DisjunctionMaxQuery(((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
> > (shelf_mark:is)^80.0)~0.01)
> > +DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
> > (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> > (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> > (content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day ?
> > gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
> > (+record_type:essay^17.0))/no_coord",
> >   "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
> > (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01
> > +((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
> > (shelf_mark:is)^80.0)~0.01 +((i18n_content_ar:gone)^3.0 |
> > (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> > (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> > (content:gone)^0.04)~0.01) (+((content:\"thim day ? gone\"~10)^2.0)~0.01)
> > (+(record_type:logical)^15.0) (+(record_type:essay)^17.0)",
> >   "facet-debug":{
> >  "elapse":0,
> >
> >
> > Query 2:
> >
> > "rawquerystring":"thim day gone",
> >   "querystring":"thim day gone",
> >   "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0
> |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
> > | (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01)
> > +DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
> > (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> > (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> > (content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day
> > gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
> > (+record_type:essay^17.0))/no_coord",
> >   "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
> > (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01
> > +((i18n_content_ar:gone)^3.0 | (i18n_content_en:gone)^3.0 |
> > (i18n_label_ar:gone)^5.0 | (i18n_label_en:gone)^5.0 |
> > (shelf_mark:gone)^80.0 | (content:gone)^0.04)~0.01) (+((content:\"thim
> day
> > gone\"~10)^2.0)~0.01) (+(record_type:logical)^15.0)
> > (+(record_type:essay)^17.0)",
> >   "facet-debug":{
> >  "elapse":1,
> >
> > Many thanks in advance.
> >
> > Best
> > Arif
> >
>


Need help with DIH plugin SOLR

2022-04-05 Thread Neha Gupta

Dear Solr Community,

Need your help.
I am running SOLR(8.11) as a standalone (on Windows) and want to index 
from the relational database(Postgres) and as such i tried to install 
DIH plugin by following the instructions given at: -

https://github.com/rohitbemax/dataimporthandler

I am stuck at step "Add the configurations and reload the collection"

sh zkcli.sh -z localhost:9983 -cmd putfile 
"/configs/products.AUTOCREATED/data-config.xml" data-config.xml


I am getting error : -
*Error: Could not find or load main class org.apache.solr.cloud.ZkCLI*



Request you to please help me with this.


Thanks and Regards
Neha Gupta


Solr Cloud - Query with results around 2 million records time out.

2022-04-05 Thread Puttaganti, Venkat
Hi Team,
 I hope you are doing good. We have come across an issue/limitation with 
Solr cloud, when users are trying to query with around 2 million data as part 
of the response.
 The same query works fine and return the results with standalone Solr. Can 
you suggested if it requires any additional configurations.

We are using Solr 8.8 and Zookeeper 3.6.2.
Heap set to 24 GB
Could has three nodes. Solr and Zookeeper in the same EC2 box.

Thanks in advance.

Regards,
Venkat.


Information Classification: General


Re: Solr Cloud - Query with results around 2 million records time out.

2022-04-05 Thread Thomas Corthals
Hi Venkat,

Do you mean 2 million documents in a single response? You should really
consider pagination
, preferably
using a cursorMark.

Regards,

Thomas

Op di 5 apr. 2022 om 13:40 schreef Puttaganti, Venkat <
venkat.puttaga...@taylorandfrancis.com>:

> Hi Team,
>  I hope you are doing good. We have come across an issue/limitation
> with Solr cloud, when users are trying to query with around 2 million data
> as part of the response.
>  The same query works fine and return the results with standalone
> Solr. Can you suggested if it requires any additional configurations.
>
> We are using Solr 8.8 and Zookeeper 3.6.2.
> Heap set to 24 GB
> Could has three nodes. Solr and Zookeeper in the same EC2 box.
>
> Thanks in advance.
>
> Regards,
> Venkat.
>
>
> Information Classification: General
>


Re: Solr Cloud - Query with results around 2 million records time out.

2022-04-05 Thread Dave
I’ve been able to download a response from standalone solr with over 40 million 
records, just takes a bit, using wget and a long timeout. I don’t know if a 
browser would be able to handle that size and time to download, let alone crash 
the browser altogether

> On Apr 5, 2022, at 8:00 AM, Thomas Corthals  wrote:
> 
> Hi Venkat,
> 
> Do you mean 2 million documents in a single response? You should really
> consider pagination
> , preferably
> using a cursorMark.
> 
> Regards,
> 
> Thomas
> 
> Op di 5 apr. 2022 om 13:40 schreef Puttaganti, Venkat <
> venkat.puttaga...@taylorandfrancis.com>:
> 
>> Hi Team,
>> I hope you are doing good. We have come across an issue/limitation
>> with Solr cloud, when users are trying to query with around 2 million data
>> as part of the response.
>> The same query works fine and return the results with standalone
>> Solr. Can you suggested if it requires any additional configurations.
>> 
>> We are using Solr 8.8 and Zookeeper 3.6.2.
>> Heap set to 24 GB
>> Could has three nodes. Solr and Zookeeper in the same EC2 box.
>> 
>> Thanks in advance.
>> 
>> Regards,
>> Venkat.
>> 
>> 
>> Information Classification: General
>> 


Re: Solr Cloud - Query with results around 2 million records time out.

2022-04-05 Thread Ere Maijala

Hi,

Doing what you describe won't work with SolrCloud. The reason for this 
is that Solr will have to merge a large number of records from the nodes 
to get the results you ask, and this is a really resource-intensive 
task. Deep paging for a limited number of results isn't much better 
since Solr will still have to do the huge merge. cursorMark, like Thomas 
suggested, is the only way to get all these results. See the 
documentation for more information:


https://solr.apache.org/guide/8_8/pagination-of-results.html#performance-problems-with-deep-paging 



Best,
Ere

Puttaganti, Venkat kirjoitti 5.4.2022 klo 14.40:

Hi Team,
  I hope you are doing good. We have come across an issue/limitation with 
Solr cloud, when users are trying to query with around 2 million data as part 
of the response.
  The same query works fine and return the results with standalone Solr. 
Can you suggested if it requires any additional configurations.

We are using Solr 8.8 and Zookeeper 3.6.2.
Heap set to 24 GB
Could has three nodes. Solr and Zookeeper in the same EC2 box.

Thanks in advance.

Regards,
Venkat.


Information Classification: General



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


solr relatedness weirdness on json facet function

2022-04-05 Thread Dan Rosher
Hi,

If I run a facet on relatedness on a qf field (examples below) which has
stopword removal, I get stopwords in the json facet?

Anyone know why, and if this can be avoided?

Many thanks,
Dan

=

Details
Solr 7.7.2

http://localhost:8983/solr/collection/select?
q=my query&
defType=edismax&
qf=description&
fore={!type=$defType qf=$qf v=$q}&
back=*:*&
rows=0&
json.facet={
  "description":{
"type": "terms",
"field": "description",
"sort": { "relatedness": "desc"},
"mincount": 2,
"limit": 8,
"facet": {
"relatedness": {
"type": "func",
"func": "relatedness($fore,$back)"
}
}
  }
}


Re: Solr as a dedicated data store?

2022-04-05 Thread James Greene
The 'no' response is traditional and a bit dated.  If you have proper
backup/snapshots happening it is totally plausible to use solr (lucene) as
a primary data store. If you need field/config changes you can import a
collection from an existing collection doing the field transforms on the
fly.

There are a growing number of products built on lucene/elastic that act as
a primary datastore. There is no reason solr can't be used as the same
outside of the core devs slow response to bugs/documentation but that's a
topic for questioning using solr at all.

Like all software solutions your system should be designed with redundancy
and resiliency.

Good Luck!

On Tue, Apr 5, 2022, 12:44 AM Tim Casey  wrote:

> Srijan,
>
> Comments off the top of my head, so buyer beware.
>
> Almost always you want to be able to reindex your data from a 'source'.
> This makes things like indexes not good as a data store, or a source of
> truth.  The reasons for this vary.  Indexes age out data because there is
> frequently a weight towards more recent items, indexes need to be reindexed
> for new info to index/issues during indexing/processing, and the list would
> go on.
>
> I have built an index data POJO store in lucene a *long* time ago.  It is
> doable to hydrate a stored object into a language level object, such as a
> java object instance.  It is fairly straightforward to data model from a
> 'common' type of data model into an index as a data model.  But, it is not
> quite the same query expectations and so on.  It is is not that far, but
> again, this is not what the primary focus of an invertible index is.  The
> primary focus is to take unstructured language data and return results in a
> hopefully well ordered list.
>
> So, the first you might do is treat the different sources of data as
> different clusters with a different topology.  You might stripe the data
> less and have it be more nodes than you might otherwise because you will do
> less indexing with it, than you might a normal index.  Once you make a
> decision to separate out the data, then you have to contend with two
> different indexes having references to the same 'documents' with some id to
> tie them together and you would lose the ability to do any form of in-index
> join using document ids.  If you keep all the data in the same index, then
> you might be in a situation where the common answer is reindex and you
> would not know what to do about the "metadata".
>
> I strongly suspect what you want is to have a way to either maintain the
> metadata within the index and use it simply as you would along with the
> documents.  As you spider, keep the info about the document with the
> document contents.  I cannot think of a reason to keep all of the data in a
> kinda weird separate space.If you want to be more sophisticated, then
> you can build an ETL which takes documents and forms indexable units, store
> the indexable units for reindexing.  This is usually pretty quick and
> separates out the crawling, ETL and indexing/query pieces, for all that
> means.   This is more complicated, but would be a bit more standard in how
> people think about it.
>
> tim
>
>
>
> On Mon, Apr 4, 2022 at 7:32 PM Shawn Heisey  wrote:
>
> > On 4/4/2022 5:52 AM, Srijan wrote:
> > > I am working on designing a Solr based enterprise search solution. One
> > > requirement I have is to track crawled data from various different data
> > > sources with metadata like crawled date, indexing status and so on. I
> am
> > > looking into using Solr itself as my data store and not adding a
> separate
> > > database to my stack. Has anyone used Solr as a dedicated data store?
> How
> > > did it compare to an RDBMS?
> >
> > As you've been told, Solr is NOT a database.  It is most definitely not
> > equivalent in any way to an RDBMS.  If you want the kinds of things an
> > RDBMS is good for, you should use an RDBMS, not Solr.
> >
> > Handling ever-changing search requirements in Solr is typically going to
> > require the kinds of schema changes that need a full reindex.  So you
> > probably wouldn't be able to use the same Solr index for your data
> > storage as you do for searching anyway.
> >
> > If you're going to need to set up two Solr installs to handle your
> > needs, you should probably NOT use Solr for the storage role.  Use
> > something that has been tested and hardened against data loss. Solr does
> > do its best to never lose data, but guaranteed data durability is not
> > one of its design goals.  The changes that would be required to make
> > that guarantee would most likely have an extremely adverse effect on
> > search performance.
> >
> > Solr's core functionality has always been search.  Search is what it's
> > good at, and that's what will be optimized in future versions ... not
> > any kind of database functionality.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Need help with DIH plugin SOLR

2022-04-05 Thread James Greene
Stand alone mode does not use zookeeper, you do not need to upload configs
using zkcli.sh.



On Tue, Apr 5, 2022, 6:55 AM Neha Gupta  wrote:

> Dear Solr Community,
>
> Need your help.
> I am running SOLR(8.11) as a standalone (on Windows) and want to index
> from the relational database(Postgres) and as such i tried to install
> DIH plugin by following the instructions given at: -
> https://github.com/rohitbemax/dataimporthandler
>
> I am stuck at step "Add the configurations and reload the collection"
>
> sh zkcli.sh -z localhost:9983 -cmd putfile
> "/configs/products.AUTOCREATED/data-config.xml" data-config.xml
>
> I am getting error : -
> *Error: Could not find or load main class org.apache.solr.cloud.ZkCLI*
>
>
>
> Request you to please help me with this.
>
>
> Thanks and Regards
> Neha Gupta
>


Re: Solr Cloud - Query with results around 2 million records time out.

2022-04-05 Thread James Greene
Also make sure the timeout isn't at the transport layer. People often get
failures and after much headache realize some transport layer timeout of
300s killed the response stream.

On Tue, Apr 5, 2022, 7:42 AM Puttaganti, Venkat <
venkat.puttaga...@taylorandfrancis.com> wrote:

> Hi Team,
>  I hope you are doing good. We have come across an issue/limitation
> with Solr cloud, when users are trying to query with around 2 million data
> as part of the response.
>  The same query works fine and return the results with standalone
> Solr. Can you suggested if it requires any additional configurations.
>
> We are using Solr 8.8 and Zookeeper 3.6.2.
> Heap set to 24 GB
> Could has three nodes. Solr and Zookeeper in the same EC2 box.
>
> Thanks in advance.
>
> Regards,
> Venkat.
>
>
> Information Classification: General
>


Re: Solr as a dedicated data store?

2022-04-05 Thread Markus Jelsma
> The 'no' response is traditional and a bit dated.

Agreed, we have been using Solr as a main data store for many years for
some usecases. But, we only store either logs or data that we can reproduce
or regenerate.

The original message wrote about storing a CrawlDB, in that case storing it
in Solr is fine, the data is easy to reproduce in case of distaster.



Op di 5 apr. 2022 om 15:26 schreef James Greene :

> The 'no' response is traditional and a bit dated.  If you have proper
> backup/snapshots happening it is totally plausible to use solr (lucene) as
> a primary data store. If you need field/config changes you can import a
> collection from an existing collection doing the field transforms on the
> fly.
>
> There are a growing number of products built on lucene/elastic that act as
> a primary datastore. There is no reason solr can't be used as the same
> outside of the core devs slow response to bugs/documentation but that's a
> topic for questioning using solr at all.
>
> Like all software solutions your system should be designed with redundancy
> and resiliency.
>
> Good Luck!
>
> On Tue, Apr 5, 2022, 12:44 AM Tim Casey  wrote:
>
> > Srijan,
> >
> > Comments off the top of my head, so buyer beware.
> >
> > Almost always you want to be able to reindex your data from a 'source'.
> > This makes things like indexes not good as a data store, or a source of
> > truth.  The reasons for this vary.  Indexes age out data because there is
> > frequently a weight towards more recent items, indexes need to be
> reindexed
> > for new info to index/issues during indexing/processing, and the list
> would
> > go on.
> >
> > I have built an index data POJO store in lucene a *long* time ago.  It is
> > doable to hydrate a stored object into a language level object, such as a
> > java object instance.  It is fairly straightforward to data model from a
> > 'common' type of data model into an index as a data model.  But, it is
> not
> > quite the same query expectations and so on.  It is is not that far, but
> > again, this is not what the primary focus of an invertible index is.  The
> > primary focus is to take unstructured language data and return results
> in a
> > hopefully well ordered list.
> >
> > So, the first you might do is treat the different sources of data as
> > different clusters with a different topology.  You might stripe the data
> > less and have it be more nodes than you might otherwise because you will
> do
> > less indexing with it, than you might a normal index.  Once you make a
> > decision to separate out the data, then you have to contend with two
> > different indexes having references to the same 'documents' with some id
> to
> > tie them together and you would lose the ability to do any form of
> in-index
> > join using document ids.  If you keep all the data in the same index,
> then
> > you might be in a situation where the common answer is reindex and you
> > would not know what to do about the "metadata".
> >
> > I strongly suspect what you want is to have a way to either maintain the
> > metadata within the index and use it simply as you would along with the
> > documents.  As you spider, keep the info about the document with the
> > document contents.  I cannot think of a reason to keep all of the data
> in a
> > kinda weird separate space.If you want to be more sophisticated, then
> > you can build an ETL which takes documents and forms indexable units,
> store
> > the indexable units for reindexing.  This is usually pretty quick and
> > separates out the crawling, ETL and indexing/query pieces, for all that
> > means.   This is more complicated, but would be a bit more standard in
> how
> > people think about it.
> >
> > tim
> >
> >
> >
> > On Mon, Apr 4, 2022 at 7:32 PM Shawn Heisey  wrote:
> >
> > > On 4/4/2022 5:52 AM, Srijan wrote:
> > > > I am working on designing a Solr based enterprise search solution.
> One
> > > > requirement I have is to track crawled data from various different
> data
> > > > sources with metadata like crawled date, indexing status and so on. I
> > am
> > > > looking into using Solr itself as my data store and not adding a
> > separate
> > > > database to my stack. Has anyone used Solr as a dedicated data store?
> > How
> > > > did it compare to an RDBMS?
> > >
> > > As you've been told, Solr is NOT a database.  It is most definitely not
> > > equivalent in any way to an RDBMS.  If you want the kinds of things an
> > > RDBMS is good for, you should use an RDBMS, not Solr.
> > >
> > > Handling ever-changing search requirements in Solr is typically going
> to
> > > require the kinds of schema changes that need a full reindex.  So you
> > > probably wouldn't be able to use the same Solr index for your data
> > > storage as you do for searching anyway.
> > >
> > > If you're going to need to set up two Solr installs to handle your
> > > needs, you should probably NOT use Solr for the storage role.  Use
> > > something that has been tested and hardened ag

Re: solr relatedness weirdness on json facet function

2022-04-05 Thread Michael Gibney
Both `qf` and `relatedness` should be orthogonal to your question, iiuc.
Understanding that your question is mainly about which terms are included
(i.e. included at all -- nevermind ranking), then the only thing that
should determine that is the field and fieldType config for the terms facet
"field" property -- i.e., "description". Can you share that information,
including index-time analysis chain config?

On Tue, Apr 5, 2022 at 8:52 AM Dan Rosher  wrote:

> Hi,
>
> If I run a facet on relatedness on a qf field (examples below) which has
> stopword removal, I get stopwords in the json facet?
>
> Anyone know why, and if this can be avoided?
>
> Many thanks,
> Dan
>
> =
>
> Details
> Solr 7.7.2
>
> http://localhost:8983/solr/collection/select?
> q=my query&
> defType=edismax&
> qf=description&
> fore={!type=$defType qf=$qf v=$q}&
> back=*:*&
> rows=0&
> json.facet={
>   "description":{
> "type": "terms",
> "field": "description",
> "sort": { "relatedness": "desc"},
> "mincount": 2,
> "limit": 8,
> "facet": {
> "relatedness": {
> "type": "func",
> "func": "relatedness($fore,$back)"
> }
> }
>   }
> }
>


Solr - Spring cloud – CVE-2022-22963

2022-04-05 Thread Rajath Banagi Ravindra
Hi,

We are using Solr 7.5 version and Solr 6.4 version in our applications. Wanted 
to check if there is any impact due to vulnerability - CVE-2022-22963. I did 
googling and didn’t find any clear answers regarding same. Are Solr instances 
vulnerable and any action needed from our end, please let me know.

Regards
Rajath



http://www.mindtree.com/email/disclaimer.html


Re: Solr - Spring cloud – CVE-2022-22963

2022-04-05 Thread Mike Drob
Rajath,

All of our known and disclosed security vulnerabilities are posted to
https://solr.apache.org/security.html

If something is not listed there than it is either not known, not
disclosed, or not a problem. If you believe you have discovered a new
vulnerability please follow the reporting guidelines outlined on that site.

Thank you,

Mike

On Tue, Apr 5, 2022 at 8:59 AM Rajath Banagi Ravindra
 wrote:

> Hi,
>
> We are using Solr 7.5 version and Solr 6.4 version in our applications.
> Wanted to check if there is any impact due to vulnerability -
> CVE-2022-22963. I did googling and didn’t find any clear answers regarding
> same. Are Solr instances vulnerable and any action needed from our end,
> please let me know.
>
> Regards
> Rajath
>
> 
>
> http://www.mindtree.com/email/disclaimer.html
>


Sore Core not showing in Dashboard

2022-04-05 Thread Dharambir Singh
Hi Team,

I am facing an issue, I have few core present in ~\solr-7.5.0\server\solr but 
it is not getting listed in Core Selector. Any pointer where should I check the 
reason for not getting displayed in Sitecore Dashboard.




Sent from Mail for Windows

DISCLAIMER: The information transmitted, including any attachments, is intended 
only for the person or entity to which it is addressed and contains 
confidential and/or privileged material. Sharing this message or its contents 
with a third party without prior written consent is strictly prohibited. If you 
receive this email by mistake, please advise the sender and delete it 
immediately. Email transmission cannot be guaranteed to be secure or virus-free 
as information could be intercepted, corrupted, lost or destroyed as a result 
of the transmission process. Therefore, you should check the email for threats 
with proper software, as the Company does not accept liability for any damage 
inflicted by viewing the content of this email. Views or opinions presented in 
this email are solely those of the author and do not necessarily represent 
those of the Company. Through this email, no employee or agent is authorized to 
conclude/commit any new or incidental terms, which are not part of the original 
contract with any client or vendor, other than by way of duly executed and 
signed amendment contract between the parties in accordance with the agreed 
protocol of the contract. Coforge Limited and its related entities consider the 
privacy of its clients to be of utmost importance and work to protect it. 
Please be aware that the Company monitors email communications through our 
networks for compliance purposes. The Privacy Policy of the Company can be 
accessed from our website www.coforge.com.


Re: Sore Core not showing in Dashboard

2022-04-05 Thread Charlie Hull
Sitecore is a commercial product and thus outside the scope of this list 
which is focused on Solr.


Best

Charlie

On 05/04/2022 17:06, Dharambir Singh wrote:

Hi Team,

I am facing an issue, I have few core present in ~\solr-7.5.0\server\solr but 
it is not getting listed in Core Selector. Any pointer where should I check the 
reason for not getting displayed in Sitecore Dashboard.




Sent from Mail  for Windows

DISCLAIMER: The information transmitted, including any attachments, is intended 
only for the person or entity to which it is addressed and contains 
confidential and/or privileged material. Sharing this message or its contents 
with a third party without prior written consent is strictly prohibited. If you 
receive this email by mistake, please advise the sender and delete it 
immediately. Email transmission cannot be guaranteed to be secure or virus-free 
as information could be intercepted, corrupted, lost or destroyed as a result 
of the transmission process. Therefore, you should check the email for threats 
with proper software, as the Company does not accept liability for any damage 
inflicted by viewing the content of this email. Views or opinions presented in 
this email are solely those of the author and do not necessarily represent 
those of the Company. Through this email, no employee or agent is authorized to 
conclude/commit any new or incidental terms, which are not part of the original 
contract with any client or vendor, other than by way of duly executed and 
signed amendment contract between the parties in accordance with the agreed 
protocol of the contract. Coforge Limited and its related entities consider the 
privacy of its clients to be of utmost importance and work to protect it. 
Please be aware that the Company monitors email communications through our 
networks for compliance purposes. The Privacy Policy of the Company can be 
accessed from our websitewww.coforge.com.


--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network  
and co-author of Searching the Enterprise 


tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II


--
This email has been checked for viruses by AVG.
https://www.avg.com


Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread gnandre
Hi, here are the relevant fields from the schema.





There are no copyfields for views_count.

Here are the corresponding atomic indexing and commit requests:

curl http://solr:8983/solr/answers/update -d '[{"id" :
"answers:question:8029","views_count" : {"set":111}}]'
curl "http://solr:8983/solr/answers/update?commit=true";

It DOES change the value successfully. To verify if it is doing atomic
indexing or in-place update, I changed the name of one other field
from

to

and reloaded the schema.

Now, when I send above mentioned atomic indexing request, I get following
error message:

{
  "responseHeader":{
"status":400,
"QTime":7},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"ERROR: [doc=answers:question:8029] unknown field 'asset_type'",
"code":400}}

So, I believe that it is still trying to index other fields as well from
their stored values and it is not an in-place update. What am I missing?

On Fri, Apr 1, 2022 at 9:50 PM Shawn Heisey  wrote:

> On 3/31/22 13:36, gnandre wrote:
> > Here is what I tried to confirm if it is still doing atomic indexing and
> > not in-place indexing. I changed one other unrelated field's name and
> > reloaded the schema.
> > Now, when I performed the indexing just for the field that I wanted to
> > update in-place, it should not have complained about this other unrelated
> > field as it wouldn't bother indexing it.
> > But it did complain with 'unknown field' for the unrelated field. So that
> > tells me it is still doing atomic indexing and trying to index the whole
> > document with all fields.
> >
> > Is my understanding correct? If so, then why are in-place updates still
> not
> > working?
>
> Can you share your schema, the atomic update request you are sending,
> and an idea of what the contents of all the fields in the existing
> document are?
>
> Thanks,
> Shawn
>
>


Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread Shawn Heisey

On 4/5/22 10:53, gnandre wrote:

Hi, here are the relevant fields from the schema.





There are no copyfields for views_count.

Here are the corresponding atomic indexing and commit requests:

curl http://solr:8983/solr/answers/update -d '[{"id" :
"answers:question:8029","views_count" : {"set":111}}]'
curl "http://solr:8983/solr/answers/update?commit=true";


Can you do some testing when there is no other indexing activity? What 
I'd like to see is a long directory listing of the index directory 
before an update like that, and then a long directory listing after an 
update like that.  To get the kind of listing I'm after, you would use 
"ls -al" on a POSIX system like Linux, and "dir" in a command prompt on 
windows.



It DOES change the value successfully. To verify if it is doing atomic
indexing or in-place update, I changed the name of one other field
from

to

and reloaded the schema.

Now, when I send above mentioned atomic indexing request, I get following
error message:

{
   "responseHeader":{
 "status":400,
 "QTime":7},
   "error":{
 "metadata":[
   "error-class","org.apache.solr.common.SolrException",
   "root-error-class","org.apache.solr.common.SolrException"],
 "msg":"ERROR: [doc=answers:question:8029] unknown field 'asset_type'",
 "code":400}}

So, I believe that it is still trying to index other fields as well from
their stored values and it is not an in-place update. What am I missing?


It is entirely possible that the code that does atomic or in place 
updates checks the existing document against the current schema, and 
throws that error even for in-place updates.  I think it would have to 
do that to figure out whether it CAN do an in-place update.  I am not 
sure which part of the source code I would even need to check to figure 
that out.  But if you can do the test above, I should be able to tell 
you whether the update was fully atomic or in-place.


Thanks,
Shawn



Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread gnandre
Thanks, Shawn.

I conducted the test that you mentioned.

Here is the diff - https://www.diffchecker.com/sdsMiGW5

Left hand side is the state before the in-place update. Right hand side is
the state after the in-place update.

On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey  wrote:

> On 4/5/22 10:53, gnandre wrote:
> > Hi, here are the relevant fields from the schema.
> >
> > 
> >  docValues
> > ="true" multiValued="false" />
> >  > docValues="true" multiValued="false"/>
> >
> > There are no copyfields for views_count.
> >
> > Here are the corresponding atomic indexing and commit requests:
> >
> > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > "answers:question:8029","views_count" : {"set":111}}]'
> > curl "http://solr:8983/solr/answers/update?commit=true";
>
> Can you do some testing when there is no other indexing activity? What
> I'd like to see is a long directory listing of the index directory
> before an update like that, and then a long directory listing after an
> update like that.  To get the kind of listing I'm after, you would use
> "ls -al" on a POSIX system like Linux, and "dir" in a command prompt on
> windows.
>
> > It DOES change the value successfully. To verify if it is doing atomic
> > indexing or in-place update, I changed the name of one other field
> > from
> >  > multiValued="true" default="1775"/>
> > to
> >  > multiValued="true" default="1775"/>
> > and reloaded the schema.
> >
> > Now, when I send above mentioned atomic indexing request, I get following
> > error message:
> >
> > {
> >"responseHeader":{
> >  "status":400,
> >  "QTime":7},
> >"error":{
> >  "metadata":[
> >"error-class","org.apache.solr.common.SolrException",
> >"root-error-class","org.apache.solr.common.SolrException"],
> >  "msg":"ERROR: [doc=answers:question:8029] unknown field
> 'asset_type'",
> >  "code":400}}
> >
> > So, I believe that it is still trying to index other fields as well from
> > their stored values and it is not an in-place update. What am I missing?
>
> It is entirely possible that the code that does atomic or in place
> updates checks the existing document against the current schema, and
> throws that error even for in-place updates.  I think it would have to
> do that to figure out whether it CAN do an in-place update.  I am not
> sure which part of the source code I would even need to check to figure
> that out.  But if you can do the test above, I should be able to tell
> you whether the update was fully atomic or in-place.
>
> Thanks,
> Shawn
>
>


Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread Matthew Lapointe
Hi,

I encountered a similar issue recently trying to differentiate between
atomic and in-place updates. I ended up enabling debug logging for
the DirectUpdateHandler2 class via Solr UI → Logging → Level options. Then
the logs should print something like "DirectUpdateHandler2 updateDocValues"
for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
atomic update.

Not sure if this applies to your setup, but in our case atomic updates were
initially being used because we have a route.field defined and our Solr
version did not yet have the fix for SOLR-13081
.

Matthew

On Tue, Apr 5, 2022 at 2:39 PM gnandre  wrote:

> Thanks, Shawn.
>
> I conducted the test that you mentioned.
>
> Here is the diff - https://www.diffchecker.com/sdsMiGW5
>
> Left hand side is the state before the in-place update. Right hand side is
> the state after the in-place update.
>
> On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey  wrote:
>
> > On 4/5/22 10:53, gnandre wrote:
> > > Hi, here are the relevant fields from the schema.
> > >
> > > 
> > >  > docValues
> > > ="true" multiValued="false" />
> > >  > > docValues="true" multiValued="false"/>
> > >
> > > There are no copyfields for views_count.
> > >
> > > Here are the corresponding atomic indexing and commit requests:
> > >
> > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > "answers:question:8029","views_count" : {"set":111}}]'
> > > curl "http://solr:8983/solr/answers/update?commit=true";
> >
> > Can you do some testing when there is no other indexing activity? What
> > I'd like to see is a long directory listing of the index directory
> > before an update like that, and then a long directory listing after an
> > update like that.  To get the kind of listing I'm after, you would use
> > "ls -al" on a POSIX system like Linux, and "dir" in a command prompt on
> > windows.
> >
> > > It DOES change the value successfully. To verify if it is doing atomic
> > > indexing or in-place update, I changed the name of one other field
> > > from
> > >  > > multiValued="true" default="1775"/>
> > > to
> > >  > > multiValued="true" default="1775"/>
> > > and reloaded the schema.
> > >
> > > Now, when I send above mentioned atomic indexing request, I get
> following
> > > error message:
> > >
> > > {
> > >"responseHeader":{
> > >  "status":400,
> > >  "QTime":7},
> > >"error":{
> > >  "metadata":[
> > >"error-class","org.apache.solr.common.SolrException",
> > >"root-error-class","org.apache.solr.common.SolrException"],
> > >  "msg":"ERROR: [doc=answers:question:8029] unknown field
> > 'asset_type'",
> > >  "code":400}}
> > >
> > > So, I believe that it is still trying to index other fields as well
> from
> > > their stored values and it is not an in-place update. What am I
> missing?
> >
> > It is entirely possible that the code that does atomic or in place
> > updates checks the existing document against the current schema, and
> > throws that error even for in-place updates.  I think it would have to
> > do that to figure out whether it CAN do an in-place update.  I am not
> > sure which part of the source code I would even need to check to figure
> > that out.  But if you can do the test above, I should be able to tell
> > you whether the update was fully atomic or in-place.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread gnandre
Thanks, Matthew.

I tried debugging as you suggested. It seems that it is still doing atomic
update instead of in-place update.
I am not using SolrCloud, so I don't think that SOLR-13081 is applicable in
my situation. I am using Solr 8.5.2 in standalone mode.
I am not sure why in-place updates are still not getting triggered :(

solr_1   | 2022-04-05 19:37:22.453 DEBUG
(qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})

On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe 
wrote:

> Hi,
>
> I encountered a similar issue recently trying to differentiate between
> atomic and in-place updates. I ended up enabling debug logging for
> the DirectUpdateHandler2 class via Solr UI → Logging → Level options. Then
> the logs should print something like "DirectUpdateHandler2 updateDocValues"
> for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
> atomic update.
>
> Not sure if this applies to your setup, but in our case atomic updates were
> initially being used because we have a route.field defined and our Solr
> version did not yet have the fix for SOLR-13081
> .
>
> Matthew
>
> On Tue, Apr 5, 2022 at 2:39 PM gnandre  wrote:
>
> > Thanks, Shawn.
> >
> > I conducted the test that you mentioned.
> >
> > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> >
> > Left hand side is the state before the in-place update. Right hand side
> is
> > the state after the in-place update.
> >
> > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey 
> wrote:
> >
> > > On 4/5/22 10:53, gnandre wrote:
> > > > Hi, here are the relevant fields from the schema.
> > > >
> > > > 
> > > >  > > docValues
> > > > ="true" multiValued="false" />
> > > >  > > > docValues="true" multiValued="false"/>
> > > >
> > > > There are no copyfields for views_count.
> > > >
> > > > Here are the corresponding atomic indexing and commit requests:
> > > >
> > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > curl "http://solr:8983/solr/answers/update?commit=true";
> > >
> > > Can you do some testing when there is no other indexing activity? What
> > > I'd like to see is a long directory listing of the index directory
> > > before an update like that, and then a long directory listing after an
> > > update like that.  To get the kind of listing I'm after, you would use
> > > "ls -al" on a POSIX system like Linux, and "dir" in a command prompt on
> > > windows.
> > >
> > > > It DOES change the value successfully. To verify if it is doing
> atomic
> > > > indexing or in-place update, I changed the name of one other field
> > > > from
> > > >  > > > multiValued="true" default="1775"/>
> > > > to
> > > >  > > > multiValued="true" default="1775"/>
> > > > and reloaded the schema.
> > > >
> > > > Now, when I send above mentioned atomic indexing request, I get
> > following
> > > > error message:
> > > >
> > > > {
> > > >"responseHeader":{
> > > >  "status":400,
> > > >  "QTime":7},
> > > >"error":{
> > > >  "metadata":[
> > > >"error-class","org.apache.solr.common.SolrException",
> > > >"root-error-class","org.apache.solr.common.SolrException"],
> > > >  "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > 'asset_type'",
> > > >  "code":400}}
> > > >
> > > > So, I believe that it is still trying to index other fields as well
> > from
> > > > their stored values and it is not an in-place update. What am I
> > missing?
> > >
> > > It is entirely possible that the code that does atomic or in place
> > > updates checks the existing document against the current schema, and
> > > throws that error even for in-place updates.  I think it would have to
> > > do that to figure out whether it CAN do an in-place update.  I am not
> > > sure which part of the source code I would even need to check to figure
> > > that out.  But if you can do the test above, I should be able to tell
> > > you whether the update was fully atomic or in-place.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread Matthew Lapointe
That's odd! The only other thing I can think to check would be to verify
that the "id" field is configured as the unique key field for the
collection.

Matthew

On Tue, Apr 5, 2022 at 3:43 PM gnandre  wrote:

> Thanks, Matthew.
>
> I tried debugging as you suggested. It seems that it is still doing atomic
> update instead of in-place update.
> I am not using SolrCloud, so I don't think that SOLR-13081 is applicable in
> my situation. I am using Solr 8.5.2 in standalone mode.
> I am not sure why in-place updates are still not getting triggered :(
>
> solr_1   | 2022-04-05 19:37:22.453 DEBUG
> (qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
> updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})
>
> On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe  >
> wrote:
>
> > Hi,
> >
> > I encountered a similar issue recently trying to differentiate between
> > atomic and in-place updates. I ended up enabling debug logging for
> > the DirectUpdateHandler2 class via Solr UI → Logging → Level options.
> Then
> > the logs should print something like "DirectUpdateHandler2
> updateDocValues"
> > for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
> > atomic update.
> >
> > Not sure if this applies to your setup, but in our case atomic updates
> were
> > initially being used because we have a route.field defined and our Solr
> > version did not yet have the fix for SOLR-13081
> > .
> >
> > Matthew
> >
> > On Tue, Apr 5, 2022 at 2:39 PM gnandre  wrote:
> >
> > > Thanks, Shawn.
> > >
> > > I conducted the test that you mentioned.
> > >
> > > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> > >
> > > Left hand side is the state before the in-place update. Right hand side
> > is
> > > the state after the in-place update.
> > >
> > > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey 
> > wrote:
> > >
> > > > On 4/5/22 10:53, gnandre wrote:
> > > > > Hi, here are the relevant fields from the schema.
> > > > >
> > > > >  docValues="true"/>
> > > > >  > > > docValues
> > > > > ="true" multiValued="false" />
> > > > >  indexed="false"
> > > > > docValues="true" multiValued="false"/>
> > > > >
> > > > > There are no copyfields for views_count.
> > > > >
> > > > > Here are the corresponding atomic indexing and commit requests:
> > > > >
> > > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > > curl "http://solr:8983/solr/answers/update?commit=true";
> > > >
> > > > Can you do some testing when there is no other indexing activity?
> What
> > > > I'd like to see is a long directory listing of the index directory
> > > > before an update like that, and then a long directory listing after
> an
> > > > update like that.  To get the kind of listing I'm after, you would
> use
> > > > "ls -al" on a POSIX system like Linux, and "dir" in a command prompt
> on
> > > > windows.
> > > >
> > > > > It DOES change the value successfully. To verify if it is doing
> > atomic
> > > > > indexing or in-place update, I changed the name of one other field
> > > > > from
> > > > >  > > > > multiValued="true" default="1775"/>
> > > > > to
> > > > >  > > > > multiValued="true" default="1775"/>
> > > > > and reloaded the schema.
> > > > >
> > > > > Now, when I send above mentioned atomic indexing request, I get
> > > following
> > > > > error message:
> > > > >
> > > > > {
> > > > >"responseHeader":{
> > > > >  "status":400,
> > > > >  "QTime":7},
> > > > >"error":{
> > > > >  "metadata":[
> > > > >"error-class","org.apache.solr.common.SolrException",
> > > > >"root-error-class","org.apache.solr.common.SolrException"],
> > > > >  "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > > 'asset_type'",
> > > > >  "code":400}}
> > > > >
> > > > > So, I believe that it is still trying to index other fields as well
> > > from
> > > > > their stored values and it is not an in-place update. What am I
> > > missing?
> > > >
> > > > It is entirely possible that the code that does atomic or in place
> > > > updates checks the existing document against the current schema, and
> > > > throws that error even for in-place updates.  I think it would have
> to
> > > > do that to figure out whether it CAN do an in-place update.  I am not
> > > > sure which part of the source code I would even need to check to
> figure
> > > > that out.  But if you can do the test above, I should be able to tell
> > > > you whether the update was fully atomic or in-place.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
>


Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread gnandre
It is configured as a unique field.

id



On Tue, Apr 5, 2022 at 4:10 PM Matthew Lapointe 
wrote:

> That's odd! The only other thing I can think to check would be to verify
> that the "id" field is configured as the unique key field for the
> collection.
>
> Matthew
>
> On Tue, Apr 5, 2022 at 3:43 PM gnandre  wrote:
>
> > Thanks, Matthew.
> >
> > I tried debugging as you suggested. It seems that it is still doing
> atomic
> > update instead of in-place update.
> > I am not using SolrCloud, so I don't think that SOLR-13081 is applicable
> in
> > my situation. I am using Solr 8.5.2 in standalone mode.
> > I am not sure why in-place updates are still not getting triggered :(
> >
> > solr_1   | 2022-04-05 19:37:22.453 DEBUG
> > (qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
> >
> updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})
> >
> > On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe <
> mlapoi...@alpha-sense.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I encountered a similar issue recently trying to differentiate between
> > > atomic and in-place updates. I ended up enabling debug logging for
> > > the DirectUpdateHandler2 class via Solr UI → Logging → Level options.
> > Then
> > > the logs should print something like "DirectUpdateHandler2
> > updateDocValues"
> > > for an in-place update, or "DirectUpdateHandler2 updateDocument" for an
> > > atomic update.
> > >
> > > Not sure if this applies to your setup, but in our case atomic updates
> > were
> > > initially being used because we have a route.field defined and our Solr
> > > version did not yet have the fix for SOLR-13081
> > > .
> > >
> > > Matthew
> > >
> > > On Tue, Apr 5, 2022 at 2:39 PM gnandre 
> wrote:
> > >
> > > > Thanks, Shawn.
> > > >
> > > > I conducted the test that you mentioned.
> > > >
> > > > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> > > >
> > > > Left hand side is the state before the in-place update. Right hand
> side
> > > is
> > > > the state after the in-place update.
> > > >
> > > > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey 
> > > wrote:
> > > >
> > > > > On 4/5/22 10:53, gnandre wrote:
> > > > > > Hi, here are the relevant fields from the schema.
> > > > > >
> > > > > >  > docValues="true"/>
> > > > > >  stored="false"
> > > > > docValues
> > > > > > ="true" multiValued="false" />
> > > > > >  > indexed="false"
> > > > > > docValues="true" multiValued="false"/>
> > > > > >
> > > > > > There are no copyfields for views_count.
> > > > > >
> > > > > > Here are the corresponding atomic indexing and commit requests:
> > > > > >
> > > > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > > > curl "http://solr:8983/solr/answers/update?commit=true";
> > > > >
> > > > > Can you do some testing when there is no other indexing activity?
> > What
> > > > > I'd like to see is a long directory listing of the index directory
> > > > > before an update like that, and then a long directory listing after
> > an
> > > > > update like that.  To get the kind of listing I'm after, you would
> > use
> > > > > "ls -al" on a POSIX system like Linux, and "dir" in a command
> prompt
> > on
> > > > > windows.
> > > > >
> > > > > > It DOES change the value successfully. To verify if it is doing
> > > atomic
> > > > > > indexing or in-place update, I changed the name of one other
> field
> > > > > > from
> > > > > >  indexed="true"
> > > > > > multiValued="true" default="1775"/>
> > > > > > to
> > > > > >  indexed="true"
> > > > > > multiValued="true" default="1775"/>
> > > > > > and reloaded the schema.
> > > > > >
> > > > > > Now, when I send above mentioned atomic indexing request, I get
> > > > following
> > > > > > error message:
> > > > > >
> > > > > > {
> > > > > >"responseHeader":{
> > > > > >  "status":400,
> > > > > >  "QTime":7},
> > > > > >"error":{
> > > > > >  "metadata":[
> > > > > >"error-class","org.apache.solr.common.SolrException",
> > > > > >
> "root-error-class","org.apache.solr.common.SolrException"],
> > > > > >  "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > > > 'asset_type'",
> > > > > >  "code":400}}
> > > > > >
> > > > > > So, I believe that it is still trying to index other fields as
> well
> > > > from
> > > > > > their stored values and it is not an in-place update. What am I
> > > > missing?
> > > > >
> > > > > It is entirely possible that the code that does atomic or in place
> > > > > updates checks the existing document against the current schema,
> and
> > > > > throws that error even for in-place updates.  I think it would have
> > to
> > > > > do that to figure out whether it CAN do an in-place update.  I am
> not
> > > > > sure which part of the source code I would even need to check to
> > figure
> > > > > that out.  But if you can do the test above, I

Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread Matthew Lapointe
Are there any update request processors defined that could be adding
default values?


On Tue, Apr 5, 2022 at 4:53 PM gnandre  wrote:

> It is configured as a unique field.
>
> id
> 
>
>
> On Tue, Apr 5, 2022 at 4:10 PM Matthew Lapointe  >
> wrote:
>
> > That's odd! The only other thing I can think to check would be to verify
> > that the "id" field is configured as the unique key field for the
> > collection.
> >
> > Matthew
> >
> > On Tue, Apr 5, 2022 at 3:43 PM gnandre  wrote:
> >
> > > Thanks, Matthew.
> > >
> > > I tried debugging as you suggested. It seems that it is still doing
> > atomic
> > > update instead of in-place update.
> > > I am not using SolrCloud, so I don't think that SOLR-13081 is
> applicable
> > in
> > > my situation. I am using Solr 8.5.2 in standalone mode.
> > > I am not sure why in-place updates are still not getting triggered :(
> > >
> > > solr_1   | 2022-04-05 19:37:22.453 DEBUG
> > > (qtp825658265-16) [   x:answers] o.a.s.u.DirectUpdateHandler2
> > >
> >
> updateDocument(add{_version_=1729298371656548352,id=answers:question:8029})
> > >
> > > On Tue, Apr 5, 2022 at 3:10 PM Matthew Lapointe <
> > mlapoi...@alpha-sense.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I encountered a similar issue recently trying to differentiate
> between
> > > > atomic and in-place updates. I ended up enabling debug logging for
> > > > the DirectUpdateHandler2 class via Solr UI → Logging → Level options.
> > > Then
> > > > the logs should print something like "DirectUpdateHandler2
> > > updateDocValues"
> > > > for an in-place update, or "DirectUpdateHandler2 updateDocument" for
> an
> > > > atomic update.
> > > >
> > > > Not sure if this applies to your setup, but in our case atomic
> updates
> > > were
> > > > initially being used because we have a route.field defined and our
> Solr
> > > > version did not yet have the fix for SOLR-13081
> > > > .
> > > >
> > > > Matthew
> > > >
> > > > On Tue, Apr 5, 2022 at 2:39 PM gnandre 
> > wrote:
> > > >
> > > > > Thanks, Shawn.
> > > > >
> > > > > I conducted the test that you mentioned.
> > > > >
> > > > > Here is the diff - https://www.diffchecker.com/sdsMiGW5
> > > > >
> > > > > Left hand side is the state before the in-place update. Right hand
> > side
> > > > is
> > > > > the state after the in-place update.
> > > > >
> > > > > On Tue, Apr 5, 2022 at 1:05 PM Shawn Heisey  >
> > > > wrote:
> > > > >
> > > > > > On 4/5/22 10:53, gnandre wrote:
> > > > > > > Hi, here are the relevant fields from the schema.
> > > > > > >
> > > > > > >  > > docValues="true"/>
> > > > > > >  > stored="false"
> > > > > > docValues
> > > > > > > ="true" multiValued="false" />
> > > > > > >  > > indexed="false"
> > > > > > > docValues="true" multiValued="false"/>
> > > > > > >
> > > > > > > There are no copyfields for views_count.
> > > > > > >
> > > > > > > Here are the corresponding atomic indexing and commit requests:
> > > > > > >
> > > > > > > curl http://solr:8983/solr/answers/update -d '[{"id" :
> > > > > > > "answers:question:8029","views_count" : {"set":111}}]'
> > > > > > > curl "http://solr:8983/solr/answers/update?commit=true";
> > > > > >
> > > > > > Can you do some testing when there is no other indexing activity?
> > > What
> > > > > > I'd like to see is a long directory listing of the index
> directory
> > > > > > before an update like that, and then a long directory listing
> after
> > > an
> > > > > > update like that.  To get the kind of listing I'm after, you
> would
> > > use
> > > > > > "ls -al" on a POSIX system like Linux, and "dir" in a command
> > prompt
> > > on
> > > > > > windows.
> > > > > >
> > > > > > > It DOES change the value successfully. To verify if it is doing
> > > > atomic
> > > > > > > indexing or in-place update, I changed the name of one other
> > field
> > > > > > > from
> > > > > > >  > indexed="true"
> > > > > > > multiValued="true" default="1775"/>
> > > > > > > to
> > > > > > >  > indexed="true"
> > > > > > > multiValued="true" default="1775"/>
> > > > > > > and reloaded the schema.
> > > > > > >
> > > > > > > Now, when I send above mentioned atomic indexing request, I get
> > > > > following
> > > > > > > error message:
> > > > > > >
> > > > > > > {
> > > > > > >"responseHeader":{
> > > > > > >  "status":400,
> > > > > > >  "QTime":7},
> > > > > > >"error":{
> > > > > > >  "metadata":[
> > > > > > >"error-class","org.apache.solr.common.SolrException",
> > > > > > >
> > "root-error-class","org.apache.solr.common.SolrException"],
> > > > > > >  "msg":"ERROR: [doc=answers:question:8029] unknown field
> > > > > > 'asset_type'",
> > > > > > >  "code":400}}
> > > > > > >
> > > > > > > So, I believe that it is still trying to index other fields as
> > well
> > > > > from
> > > > > > > their stored values and it is not an in-place update. What am I
> > > > > missing?
> > > > > >
> > > > > > It is entirely p

Re: Atomic indexing without whole document getting indexed again

2022-04-05 Thread Shawn Heisey

On 4/5/2022 12:38 PM, gnandre wrote:

I conducted the test that you mentioned.

Here is the diff - https://www.diffchecker.com/sdsMiGW5

Left hand side is the state before the in-place update. Right hand side is
the state after the in-place update.


That looks very strange to me.  If it were doing a full add/delete type 
of atomic update, I would expect there to be a new segment, but that 
shows all the same segments and new timestamps on the files for the oyyo 
segment, with the fdt file (which I believe has stored field data) 
changing size.  Not what I thought would happen for EITHER scenario.  
But I have to admit that I am not completely sure how things happen at 
the Lucene level for in-place updates.


These listings shows the user and group for those files as 8983 ... so 
the OS where you gathered this info is NOT the system where Solr is 
actually running.  It could be either a container situation like Docker, 
or a network filesystem.  I believe that uid/gid 8983 is used for the 
solr user in the docker images available for solr.


Thanks,
Shawn