date:20230504

facet domain change with blockChildren not working as expected

2023-05-04 Thread Igor Blanco


I have a document with nested documents indexed as this:

{

    'id':1,
    'creation_date':'2023-02-09T07:18:59Z',
    'update_date':'2023-05-03T14:37:08Z',
    'dictionary':{
    'id':'DIC1',
    'lang_ids':[
    2,
    3
    ]
    },
    'contexts':[
    ],
    'definitions':[
    {
    'id':'DFN1',
    'lang_id':2,
    'definition':'una definición'
    }
    ],
    'denominations':[
    {
    'id':'DNM1',
    'lang_id':2,
    'denomination':'Casa',
    'feminine_form':'no procede',
    'masculine_form':'no procede'
    }
    ],
    'illustrations':[
    ],
    'notes':[
    ],
    'observations':[
    ],
    'videos':[
    ]

}


I want to find that document and a facet that lists each of the lang_ids 
in the dictionary subdocument.


So I try a query like this:

http://0.0.0.0:8983/solr/index_cards/select?facet=true&indent=true&json.facet=%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22working_language_ids%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22domain%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22blockChildren%22%3A%22id%3ADIC*%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22type%22%3A%20%22terms%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22lang_ids%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22limit%22%3A%20-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D&q.op=OR&q=id%3A1&useParams=


   q parameter is => id:1

   json.facet parameter is =>

   {
    "working_language_ids": {
    "domain": {
   "blockChildren":"id:DIC*"
   },
    "type": "terms",
    "field": "lang_ids",
    "limit": -1
    }
    }

The result returns the expected parent document, but 
working_language_ids facet is empty:


"facets":{
    "count":1,
    "working_language_ids":{
  "buckets":[]}}


If i query directly "id:DIC*" and get the dictionary subdocuments 
instead of the parent and ommit the "domain" section of the facet it 
returns the expected result, so the problem does not seem to be in the 
in the indexing of lang_ids itself nor in the facet definition, but most 
probably in the use of "blockChildren".


Any clue will be much appreciated, thanks.


--


 IgorBlanco

Director desarrollo a medida | Neurrirako garapenen zuzendaria

Binovo IT Human Project




943 569 206  | 690229375 

ibla...@binovo.es 

binovo.es 

Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun





youtube    
linkedin

Re: facet domain change with blockChildren not working as expected

2023-05-04 Thread Mikhail Khludnev

Hello Igor.
I'm not sure parent/child docs are indexed well in this particular case.
But I spot one detail in the ref guide ...  exclusively matches all parent
documents in the collection.
Presumably it should be  "blockChildren":"id:[0 TO 9]"
Beforehand, check that this query matches only parent documents.

On Thu, May 4, 2023 at 10:14 AM Igor Blanco 
wrote:

> I have a document with nested documents indexed as this:
>
> {
>
>  'id':1,
>  'creation_date':'2023-02-09T07:18:59Z',
>  'update_date':'2023-05-03T14:37:08Z',
>  'dictionary':{
>  'id':'DIC1',
>  'lang_ids':[
>  2,
>  3
>  ]
>  },
>  'contexts':[
>  ],
>  'definitions':[
>  {
>  'id':'DFN1',
>  'lang_id':2,
>  'definition':'una definición'
>  }
>  ],
>  'denominations':[
>  {
>  'id':'DNM1',
>  'lang_id':2,
>  'denomination':'Casa',
>  'feminine_form':'no procede',
>  'masculine_form':'no procede'
>  }
>  ],
>  'illustrations':[
>  ],
>  'notes':[
>  ],
>  'observations':[
>  ],
>  'videos':[
>  ]
>
> }
>
>
> I want to find that document and a facet that lists each of the lang_ids
> in the dictionary subdocument.
>
> So I try a query like this:
>
>
> http://0.0.0.0:8983/solr/index_cards/select?facet=true&indent=true&json.facet=%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22working_language_ids%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22domain%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22blockChildren%22%3A%22id%3ADIC*%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22type%22%3A%20%22terms%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22lang_ids%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22limit%22%3A%20-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D&q.op=OR&q=id%3A1&useParams=
>
>
> q parameter is => id:1
>
> json.facet parameter is =>
>
> {
>  "working_language_ids": {
>  "domain": {
> "blockChildren":"id:DIC*"
> },
>  "type": "terms",
>  "field": "lang_ids",
>  "limit": -1
>  }
>  }
>
> The result returns the expected parent document, but
> working_language_ids facet is empty:
>
> "facets":{
>  "count":1,
>  "working_language_ids":{
>"buckets":[]}}
>
>
> If i query directly "id:DIC*" and get the dictionary subdocuments
> instead of the parent and ommit the "domain" section of the facet it
> returns the expected result, so the problem does not seem to be in the
> in the indexing of lang_ids itself nor in the facet definition, but most
> probably in the use of "blockChildren".
>
> Any clue will be much appreciated, thanks.
>
>
> --
>
>
>   IgorBlanco
>
> Director desarrollo a medida | Neurrirako garapenen zuzendaria
>
> Binovo IT Human Project
>
>
>
>
> 943 569 206  | 690229375 
>
> ibla...@binovo.es 
>
> binovo.es 
>
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>
>
>
>
>
> youtube 
> linkedin <
> https://www.linkedin.com/company/binovo-it-human-project/>
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: standard tokenizer seemingly splitting on dot

2023-05-04 Thread Mikhail Khludnev

Raised https://github.com/apache/lucene/issues/12264.
Let's look at what devs say.

On Wed, May 3, 2023 at 6:13 PM Bill Tantzen 
wrote:

> Shawn,
> No, email addresses are not preserved -- from the docs:
>
>
>-
>
>The "@" character is among the set of token-splitting punctuation, so
>email addresses are not preserved as single tokens.
>
>
> but the non-split on "test.com" vs the split on "test7.com" is unexpected!
> ~~Bill
>
>
> On Wed, May 3, 2023 at 10:04 AM Shawn Heisey  wrote:
>
> > On 5/2/23 15:30, Bill Tantzen wrote:
> > > This works as I expected:
> > > ab00c.tif -- tokenizes as it should with a value of ab00c.tif
> > >
> > > This doesn't work as I expected
> > > ab003.tif -- tokenizes with a result of ab003 and tif
> >
> > I got the same behavior with ICUTokenizer, which uses ICU4J for Unicode
> > handling.  I am pretty sure ICU4J is IBM's implementation of Unicode.  I
> > think StandardTokenizer is using a different implementation.
> >
> > I'm on Solr 9.3.0-SNAPSHOT ... the ICU analysis components it uses
> > reference icu4j version 70.1, which is dated Oct 28, 2021 on maven
> central.
> >
> > Two different Unicode implementations are doing exactly the same thing.
> > Is it a bug, or expected behavior?  It does mean filenames are sometimes
> > not being handled in the way you expect.
> >
> > I ran another check ... I had thought that StandardTokenizer preserved
> > email addresses as a single token ... but I am seeing that t...@test.com
> > is split into two terms.  It splits t...@test7.com into three terms.
> >
> > Thanks,
> > Shawn
> >
>
>
> --
> Human wheels spin round and round
> While the clock keeps the pace... -- John Mellencamp
> 
> Bill TantzenUniversity of Minnesota Libraries
> 612-626-9949 (U of M)612-325-1777 (cell)
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Skip applying boost queries

2023-05-04 Thread Rajani Maski

Hi Solr Users,

Is there a feature that tells the query processor to skip applying bq and
bf if the number of docs matched is x? Certain queries that match a large
set of docs spend 60-70% of total Qtime in applying "bq" and "boost" so the
interest is to skip that if the number of docs matches is x.


Thanks,
Rajani

Re: Debug time spent in aggregating the search results

2023-05-04 Thread Rajani Maski

Hi Hoss,

Do I need to enable something to retrieve this metric?  I tried to query it
http://localhost:8983/solr/admin/metrics?wt=json&prefix=QUERY./select.distrib.requestTimes

but there is no "distrib" in the response, there are only select
QUERY./select.requestTimes
and others

[image: image.png]


On Wed, Apr 19, 2023 at 5:26 PM Chris Hostetter 
wrote:

> : Hi Solr Users,
> :
> : Is there a metric endpoint or a debug/explain type query param that
> : returns average time spent in aggregating the search results from shards?
>
> Sort of?
>
> Metrics like "QUERY./select.distrib.requestTimes" tell you the stats on
> handling a "distributed" request -- which is when a core is responsible to
> sending out "per-shard" requests and merging the responses.
>
> But it doesn't *only* include the "time spent in aggregating the search
> results from shards" ... it also includes the time spent determining which
> requests to send to which shards, and waiting for the responses to those
> (frequently concurrent) requests"
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: facet domain change with blockChildren not working as expected

2023-05-04 Thread Igor Blanco


Hi Mikhail,

Thanks for your response. It worked and I think that it put me on the 
right path but I'm still a bit confused


I thought that blockChain allowed me to change the domain to all the 
children of the main resulting documents and that the "id:DIC*" filter 
would limit the children in the domain to those whose id starts with 
DIC, which are only the dictionary ones.


But after trying your suggestion, rereading the doc and thanks to your 
pointer I start to understand that what I have to provide to 
blockChildren is the query to get the parent documents.  I've come with 
something like this:


 "blockChildren":"-_nest_path_:*"

It seems to work.

But in my case this works because I only have "lang_ids" field in the 
dictionary subdocuments... but what if this field was also present in 
other subdocuments? Is there a way to reduce the new domain to only the 
children's that comply to some kind of filter?


Thanks a lot.



El 4/5/23 a las 10:58, Mikhail Khludnev escribió:

Hello Igor.
I'm not sure parent/child docs are indexed well in this particular case.
But I spot one detail in the ref guide ...  exclusively matches all parent
documents in the collection.
Presumably it should be  "blockChildren":"id:[0 TO 9]"
Beforehand, check that this query matches only parent documents.

On Thu, May 4, 2023 at 10:14 AM Igor Blanco
wrote:


I have a document with nested documents indexed as this:

{

  'id':1,
  'creation_date':'2023-02-09T07:18:59Z',
  'update_date':'2023-05-03T14:37:08Z',
  'dictionary':{
  'id':'DIC1',
  'lang_ids':[
  2,
  3
  ]
  },
  'contexts':[
  ],
  'definitions':[
  {
  'id':'DFN1',
  'lang_id':2,
  'definition':'una definición'
  }
  ],
  'denominations':[
  {
  'id':'DNM1',
  'lang_id':2,
  'denomination':'Casa',
  'feminine_form':'no procede',
  'masculine_form':'no procede'
  }
  ],
  'illustrations':[
  ],
  'notes':[
  ],
  'observations':[
  ],
  'videos':[
  ]

}


I want to find that document and a facet that lists each of the lang_ids
in the dictionary subdocument.

So I try a query like this:


http://0.0.0.0:8983/solr/index_cards/select?facet=true&indent=true&json.facet=%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22working_language_ids%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22domain%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22blockChildren%22%3A%22id%3ADIC*%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22type%22%3A%20%22terms%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22lang_ids%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22limit%22%3A%20-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D&q.op=OR&q=id%3A1&useParams=


 q parameter is => id:1

 json.facet parameter is =>

 {
  "working_language_ids": {
  "domain": {
 "blockChildren":"id:DIC*"
 },
  "type": "terms",
  "field": "lang_ids",
  "limit": -1
  }
  }

The result returns the expected parent document, but
working_language_ids facet is empty:

"facets":{
  "count":1,
  "working_language_ids":{
"buckets":[]}}


If i query directly "id:DIC*" and get the dictionary subdocuments
instead of the parent and ommit the "domain" section of the facet it
returns the expected result, so the problem does not seem to be in the
in the indexing of lang_ids itself nor in the facet definition, but most
probably in the use of "blockChildren".

Any clue will be much appreciated, thanks.


--


   IgorBlanco

Director desarrollo a medida | Neurrirako garapenen zuzendaria

Binovo IT Human Project




 943 569 206  | 690229375

 ibla...@binovo.es  

 binovo.es 

 Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun





youtube
 linkedin< https://www.linkedin.com/company/binovo-it-human-project/>




--


 IgorBlanco

Director desarrollo a medida | Neurrirako garapenen zuzendaria

Binovo IT Human Project




943 569 206  | 690229375 

ibla...@binovo.es 

binovo.es 

Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun





youtube    
linkedin

Edismax parsing when using pf parameter

2023-05-04 Thread Mónica Marrero

Hi,

I have found what I think is an inconsistent behaviour of the query parser
when using the pf parameter.

I am testing with the techproducts collection in Solr v9.2.1,
defType=edismax, pf=text name

a) Query= name:george AND (game AND thrones)
Result= "parsedquery_toString":"+(+name:george +(+(text:game)
+(text:thrones))) (text:\"game thrones\" | name:\"game thrones\")"

b) Query= name:(george AND martin) AND (game AND thrones)
Result= "parsedquery_toString":"+(+(+name:george +name:martin)
+(+(text:game) +(text:thrones))) (text:\"*martin* game thrones\" | name:\"
*martin* game thrones\")"

I understand why in query a) pf is only applied to keywords *game *and *thrones
(*keywords in the query with no explicit field assigned), but following the
same reasoning, I would expect the same behaviour for query b) and that is
not the case (and it works as I expected for the parameter qf). Any idea
why this happens?

Best,

Mónica

-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.

becoming a solr specialist

2023-05-04 Thread ufuk yılmaz

Hi all,

First of all forgive me if asking this here is inappropriate, but I couldn’t 
think of a better place where all Solr experts gather.

I have been working as the main “solr person” at a project since 2018 where 
Solr sat at the very core of things. It’s mainly a distributed data (20+TB’s) 
analysis system where data is aggregated/analyzed  using Solr’s multilevel JSON 
faceting and streaming expressions. Search and relevancy was less important.

As that project came to an end, I am looking for another position where I can 
make use of my existing knowledge and keep building on top of that, since 
switching to something unrelated to Solr feels like so much of my previous 
effort is going to be a waste.

In this mailing list, I often see many advanced uses of Solr, so I feel there’s 
still a very long way to go and many new things to learn. But when searching 
for open positions, there are very few openings related to Solr, most of them 
just mention it with Elastic as a very complimentary tool.

Can you point me in a direction where Solr specific experience could be useful? 
Or is it a too narrow area?

Thanks for reading
Ufuk yilmaz
~~

Sent from Mail for Windows

Re: Skip applying boost queries

2023-05-04 Thread Alessandro Benedetti

The first thing that comes to my mind is the reranking query capability:
https://solr.apache.org/guide/solr/latest/query-guide/query-re-ranking.html

I am not fully satisfied by the way the feature manages the final scoring,
but I believe it can be helpful in your case!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Thu, 4 May 2023 at 15:29, Rajani Maski  wrote:

> Hi Solr Users,
>
> Is there a feature that tells the query processor to skip applying bq and
> bf if the number of docs matched is x? Certain queries that match a large
> set of docs spend 60-70% of total Qtime in applying "bq" and "boost" so the
> interest is to skip that if the number of docs matches is x.
>
>
> Thanks,
> Rajani
>

Re: becoming a solr specialist

2023-05-04 Thread Dave

Send me a personal email

> On May 4, 2023, at 11:23 AM, ufuk yılmaz  wrote:
> 
> Hi all,
> 
> First of all forgive me if asking this here is inappropriate, but I couldn’t 
> think of a better place where all Solr experts gather.
> 
> I have been working as the main “solr person” at a project since 2018 where 
> Solr sat at the very core of things. It’s mainly a distributed data (20+TB’s) 
> analysis system where data is aggregated/analyzed  using Solr’s multilevel 
> JSON faceting and streaming expressions. Search and relevancy was less 
> important.
> 
> As that project came to an end, I am looking for another position where I can 
> make use of my existing knowledge and keep building on top of that, since 
> switching to something unrelated to Solr feels like so much of my previous 
> effort is going to be a waste.
> 
> In this mailing list, I often see many advanced uses of Solr, so I feel 
> there’s still a very long way to go and many new things to learn. But when 
> searching for open positions, there are very few openings related to Solr, 
> most of them just mention it with Elastic as a very complimentary tool.
> 
> Can you point me in a direction where Solr specific experience could be 
> useful? Or is it a too narrow area?
> 
> Thanks for reading
> Ufuk yilmaz
> ~~
> 
> Sent from Mail for Windows
>

Re: Skip applying boost queries

2023-05-04 Thread Rajani Maski

Nice, this should work. Thank you so much, appreciate it.

On Thu, May 4, 2023 at 11:41 AM Alessandro Benedetti 
wrote:

> The first thing that comes to my mind is the reranking query capability:
> https://solr.apache.org/guide/solr/latest/query-guide/query-re-ranking.html
>
> I am not fully satisfied by the way the feature manages the final scoring,
> but I believe it can be helpful in your case!
>
> Cheers
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Thu, 4 May 2023 at 15:29, Rajani Maski  wrote:
>
> > Hi Solr Users,
> >
> > Is there a feature that tells the query processor to skip applying bq and
> > bf if the number of docs matched is x? Certain queries that match a large
> > set of docs spend 60-70% of total Qtime in applying "bq" and "boost" so
> the
> > interest is to skip that if the number of docs matches is x.
> >
> >
> > Thanks,
> > Rajani
> >
>

Re: facet domain change with blockChildren not working as expected

2023-05-04 Thread Mikhail Khludnev

>  Is there a way to reduce the new domain to only the
children's that comply to some kind of filter?

Sure. You can apply "filter" under "domain" to restrict a certain child
type. Check
https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#adding-domain-filters
please.

PS. we should definitely add blockChain to Solr to modernize it.  Thanks
for the clue!

On Thu, May 4, 2023 at 5:58 PM Igor Blanco 
wrote:

> Hi Mikhail,
>
> Thanks for your response. It worked and I think that it put me on the
> right path but I'm still a bit confused
>
> I thought that blockChain allowed me to change the domain to all the
> children of the main resulting documents and that the "id:DIC*" filter
> would limit the children in the domain to those whose id starts with
> DIC, which are only the dictionary ones.
>
> But after trying your suggestion, rereading the doc and thanks to your
> pointer I start to understand that what I have to provide to
> blockChildren is the query to get the parent documents.  I've come with
> something like this:
>
>   "blockChildren":"-_nest_path_:*"
>
> It seems to work.
>
> But in my case this works because I only have "lang_ids" field in the
> dictionary subdocuments... but what if this field was also present in
> other subdocuments? Is there a way to reduce the new domain to only the
> children's that comply to some kind of filter?
>
> Thanks a lot.
>
>
>
> El 4/5/23 a las 10:58, Mikhail Khludnev escribió:
> > Hello Igor.
> > I'm not sure parent/child docs are indexed well in this particular case.
> > But I spot one detail in the ref guide ...  exclusively matches all
> parent
> > documents in the collection.
> > Presumably it should be  "blockChildren":"id:[0 TO 9]"
> > Beforehand, check that this query matches only parent documents.
> >
> > On Thu, May 4, 2023 at 10:14 AM Igor Blanco
> > wrote:
> >
> >> I have a document with nested documents indexed as this:
> >>
> >> {
> >>
> >>   'id':1,
> >>   'creation_date':'2023-02-09T07:18:59Z',
> >>   'update_date':'2023-05-03T14:37:08Z',
> >>   'dictionary':{
> >>   'id':'DIC1',
> >>   'lang_ids':[
> >>   2,
> >>   3
> >>   ]
> >>   },
> >>   'contexts':[
> >>   ],
> >>   'definitions':[
> >>   {
> >>   'id':'DFN1',
> >>   'lang_id':2,
> >>   'definition':'una definición'
> >>   }
> >>   ],
> >>   'denominations':[
> >>   {
> >>   'id':'DNM1',
> >>   'lang_id':2,
> >>   'denomination':'Casa',
> >>   'feminine_form':'no procede',
> >>   'masculine_form':'no procede'
> >>   }
> >>   ],
> >>   'illustrations':[
> >>   ],
> >>   'notes':[
> >>   ],
> >>   'observations':[
> >>   ],
> >>   'videos':[
> >>   ]
> >>
> >> }
> >>
> >>
> >> I want to find that document and a facet that lists each of the lang_ids
> >> in the dictionary subdocument.
> >>
> >> So I try a query like this:
> >>
> >>
> >>
> http://0.0.0.0:8983/solr/index_cards/select?facet=true&indent=true&json.facet=%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22working_language_ids%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22domain%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22blockChildren%22%3A%22id%3ADIC*%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22type%22%3A%20%22terms%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22lang_ids%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22limit%22%3A%20-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D&q.op=OR&q=id%3A1&useParams=
> >>
> >>
> >>  q parameter is => id:1
> >>
> >>  json.facet parameter is =>
> >>
> >>  {
> >>   "working_language_ids": {
> >>   "domain": {
> >>  "blockChildren":"id:DIC*"
> >>  },
> >>   "type": "terms",
> >>   "field": "lang_ids",
> >>   "limit": -1
> >>   }
> >>   }
> >>
> >> The result returns the expected parent document, but
> >> working_language_ids facet is empty:
> >>
> >> "facets":{
> >>   "count":1,
> >>   "working_language_ids":{
> >> "buckets":[]}}
> >>
> >>
> >> If i query directly "id:DIC*" and get the dictionary subdocuments
> >> instead of the parent and ommit the "domain" section of the facet it
> >> returns the expected result, so the problem does not seem to be in the
> >> in the indexing of lang_ids itself nor in the facet definition, but most
> >> probably in the use of "blockChildren".
> >>
> >> Any clue will be much appr

Re: becoming a solr specialist

2023-05-04 Thread Doug Whitfield

We are hiring Solr folks in India (Pune, specifically) and the US. We need 
people with a broad skillset, not necessarily someone who has been working with 
Solr since 2004.

I know a bit about the progress of hiring in India, but very little in the US 
(although I know one name got passed on to the hiring manager in the US). If 
interested (and this goes for anyone on the list), get back to me ASAP as we 
are already narrowing down interview candidates in India.

In any case, might be a good fit if the geography is right.

From: ufuk yılmaz 
Date: Thursday, 4 May 2023 at 10:23
To: solr-user 
Subject: becoming a solr specialist
Hi all,

First of all forgive me if asking this here is inappropriate, but I couldn’t 
think of a better place where all Solr experts gather.

I have been working as the main “solr person” at a project since 2018 where 
Solr sat at the very core of things. It’s mainly a distributed data (20+TB’s) 
analysis system where data is aggregated/analyzed  using Solr’s multilevel JSON 
faceting and streaming expressions. Search and relevancy was less important.

As that project came to an end, I am looking for another position where I can 
make use of my existing knowledge and keep building on top of that, since 
switching to something unrelated to Solr feels like so much of my previous 
effort is going to be a waste.

In this mailing list, I often see many advanced uses of Solr, so I feel there’s 
still a very long way to go and many new things to learn. But when searching 
for open positions, there are very few openings related to Solr, most of them 
just mention it with Elastic as a very complimentary tool.

Can you point me in a direction where Solr specific experience could be useful? 
Or is it a too narrow area?

Thanks for reading
Ufuk yilmaz
~~

Sent from Mail for Windows



CAUTION: This email originated from outside of the organization. Do not click 
on links or open attachments unless you recognize the sender and know the 
content is safe.


This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any attachments 
and notify us immediately.

Re: becoming a solr specialist

2023-05-04 Thread Eric Pugh

Relevance Slack has an active #jobs channel as well.   The magic invite link is 
www.opensourceconnections.com/slack 
 and then add #jobs…. 

> On May 4, 2023, at 1:25 PM, Doug Whitfield  
> wrote:
> 
> We are hiring Solr folks in India (Pune, specifically) and the US. We need 
> people with a broad skillset, not necessarily someone who has been working 
> with Solr since 2004.
> 
> I know a bit about the progress of hiring in India, but very little in the US 
> (although I know one name got passed on to the hiring manager in the US). If 
> interested (and this goes for anyone on the list), get back to me ASAP as we 
> are already narrowing down interview candidates in India.
> 
> In any case, might be a good fit if the geography is right.
> 
> From: ufuk yılmaz 
> Date: Thursday, 4 May 2023 at 10:23
> To: solr-user 
> Subject: becoming a solr specialist
> Hi all,
> 
> First of all forgive me if asking this here is inappropriate, but I couldn’t 
> think of a better place where all Solr experts gather.
> 
> I have been working as the main “solr person” at a project since 2018 where 
> Solr sat at the very core of things. It’s mainly a distributed data (20+TB’s) 
> analysis system where data is aggregated/analyzed  using Solr’s multilevel 
> JSON faceting and streaming expressions. Search and relevancy was less 
> important.
> 
> As that project came to an end, I am looking for another position where I can 
> make use of my existing knowledge and keep building on top of that, since 
> switching to something unrelated to Solr feels like so much of my previous 
> effort is going to be a waste.
> 
> In this mailing list, I often see many advanced uses of Solr, so I feel 
> there’s still a very long way to go and many new things to learn. But when 
> searching for open positions, there are very few openings related to Solr, 
> most of them just mention it with Elastic as a very complimentary tool.
> 
> Can you point me in a direction where Solr specific experience could be 
> useful? Or is it a too narrow area?
> 
> Thanks for reading
> Ufuk yilmaz
> ~~
> 
> Sent from Mail for Windows
> 
> 
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> on links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> This e-mail may contain information that is privileged or confidential. If 
> you are not the intended recipient, please delete the e-mail and any 
> attachments and notify us immediately.
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Vector as LTR Field Value Feature Type

2023-05-04 Thread Rajani Maski

Hi Solr Users,

 Can the LTR field value feature

be
a vector field or a field with comma separated numeric values? Is it a
supported feature?  Instead of querying each field separately, which is
adding up to the query cost, thinking if it is possible to put field values
together in one field at index time and query that as a feature, thoughts?

Thanks,
Rajani

Re: standard tokenizer seemingly splitting on dot

2023-05-04 Thread Rahul Goswami

Bill,
Do you have a WordDelimiterFilterFactory in the analysis chain (with
"*preserveOriginal"
*attribute likely set to *0*)?
That would split the token on the period downstream in the analysis chain
even if StandardTokenizer doesn't.

-Rahul

On Thu, May 4, 2023 at 6:22 AM Mikhail Khludnev  wrote:

> Raised https://github.com/apache/lucene/issues/12264.
> Let's look at what devs say.
>
> On Wed, May 3, 2023 at 6:13 PM Bill Tantzen 
> wrote:
>
> > Shawn,
> > No, email addresses are not preserved -- from the docs:
> >
> >
> >-
> >
> >The "@" character is among the set of token-splitting punctuation, so
> >email addresses are not preserved as single tokens.
> >
> >
> > but the non-split on "test.com" vs the split on "test7.com" is
> unexpected!
> > ~~Bill
> >
> >
> > On Wed, May 3, 2023 at 10:04 AM Shawn Heisey 
> wrote:
> >
> > > On 5/2/23 15:30, Bill Tantzen wrote:
> > > > This works as I expected:
> > > > ab00c.tif -- tokenizes as it should with a value of ab00c.tif
> > > >
> > > > This doesn't work as I expected
> > > > ab003.tif -- tokenizes with a result of ab003 and tif
> > >
> > > I got the same behavior with ICUTokenizer, which uses ICU4J for Unicode
> > > handling.  I am pretty sure ICU4J is IBM's implementation of Unicode.
> I
> > > think StandardTokenizer is using a different implementation.
> > >
> > > I'm on Solr 9.3.0-SNAPSHOT ... the ICU analysis components it uses
> > > reference icu4j version 70.1, which is dated Oct 28, 2021 on maven
> > central.
> > >
> > > Two different Unicode implementations are doing exactly the same thing.
> > > Is it a bug, or expected behavior?  It does mean filenames are
> sometimes
> > > not being handled in the way you expect.
> > >
> > > I ran another check ... I had thought that StandardTokenizer preserved
> > > email addresses as a single token ... but I am seeing that
> t...@test.com
> > > is split into two terms.  It splits t...@test7.com into three terms.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
> > --
> > Human wheels spin round and round
> > While the clock keeps the pace... -- John Mellencamp
> > 
> > Bill TantzenUniversity of Minnesota Libraries
> > 612-626-9949 (U of M)612-325-1777 (cell)
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>

Re: standard tokenizer seemingly splitting on dot

2023-05-04 Thread Bill Tantzen

Rahul,
No I do not, but note that this behavior has been observed by others and
reported as a possible issue.
Thank you!
~~Bill

On Thu, May 4, 2023 at 1:07 PM Rahul Goswami  wrote:

> Bill,
> Do you have a WordDelimiterFilterFactory in the analysis chain (with
> "*preserveOriginal"
> *attribute likely set to *0*)?
> That would split the token on the period downstream in the analysis chain
> even if StandardTokenizer doesn't.
>
> -Rahul
>
> On Thu, May 4, 2023 at 6:22 AM Mikhail Khludnev  wrote:
>
> > Raised https://github.com/apache/lucene/issues/12264.
> > Let's look at what devs say.
> >
> > On Wed, May 3, 2023 at 6:13 PM Bill Tantzen 
> > wrote:
> >
> > > Shawn,
> > > No, email addresses are not preserved -- from the docs:
> > >
> > >
> > >-
> > >
> > >The "@" character is among the set of token-splitting punctuation,
> so
> > >email addresses are not preserved as single tokens.
> > >
> > >
> > > but the non-split on "test.com" vs the split on "test7.com" is
> > unexpected!
> > > ~~Bill
> > >
> > >
> > > On Wed, May 3, 2023 at 10:04 AM Shawn Heisey 
> > wrote:
> > >
> > > > On 5/2/23 15:30, Bill Tantzen wrote:
> > > > > This works as I expected:
> > > > > ab00c.tif -- tokenizes as it should with a value of ab00c.tif
> > > > >
> > > > > This doesn't work as I expected
> > > > > ab003.tif -- tokenizes with a result of ab003 and tif
> > > >
> > > > I got the same behavior with ICUTokenizer, which uses ICU4J for
> Unicode
> > > > handling.  I am pretty sure ICU4J is IBM's implementation of Unicode.
> > I
> > > > think StandardTokenizer is using a different implementation.
> > > >
> > > > I'm on Solr 9.3.0-SNAPSHOT ... the ICU analysis components it uses
> > > > reference icu4j version 70.1, which is dated Oct 28, 2021 on maven
> > > central.
> > > >
> > > > Two different Unicode implementations are doing exactly the same
> thing.
> > > > Is it a bug, or expected behavior?  It does mean filenames are
> > sometimes
> > > > not being handled in the way you expect.
> > > >
> > > > I ran another check ... I had thought that StandardTokenizer
> preserved
> > > > email addresses as a single token ... but I am seeing that
> > t...@test.com
> > > > is split into two terms.  It splits t...@test7.com into three terms.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> > >
> > > --
> > > Human wheels spin round and round
> > > While the clock keeps the pace... -- John Mellencamp
> > > 
> > > Bill TantzenUniversity of Minnesota Libraries
> > > 612-626-9949 (U of M)612-325-1777 (cell)
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


-- 
Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp

Bill TantzenUniversity of Minnesota Libraries
612-626-9949 (U of M)612-325-1777 (cell)

Re: Debug time spent in aggregating the search results

2023-05-04 Thread Chris Hostetter



: Do I need to enable something to retrieve this metric?  I tried to query it
: 
http://localhost:8983/solr/admin/metrics?wt=json&prefix=QUERY./select.distrib.requestTimes
: 
: but there is no "distrib" in the response, there are only select

Hrm what version of solr are you running?


Here's 8.11 ...

$ ./bin/solr -e cloud -noprompt
...
$ curl -sS 
'http://localhost:8983/solr/admin/metrics?nodes=all&wt=json&prefix=QUERY./select.distrib.requestTimes'
{
  "responseHeader":{
"status":0,
"QTime":35},
  "127.0.1.1:7574_solr":{
"responseHeader":{
  "status":0,
  "QTime":23},
"metrics":{
  "solr.core.gettingstarted.shard2.replica_n6":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0}},
  "solr.core.gettingstarted.shard1.replica_n2":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0,
  "127.0.1.1:8983_solr":{
"responseHeader":{
  "status":0,
  "QTime":2},
"metrics":{
  "solr.core.gettingstarted.shard2.replica_n4":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0}},
  "solr.core.gettingstarted.shard1.replica_n1":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0}
$ curl -sS 
'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=*:*'
{
  
"response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]
  }}
$ curl -sS 
'http://localhost:8983/solr/admin/metrics?nodes=all&wt=json&prefix=QUERY./select.distrib.requestTimes'
{
  "responseHeader":{
"status":0,
"QTime":25},
  "127.0.1.1:7574_solr":{
"responseHeader":{
  "status":0,
  "QTime":8},
"metrics":{
  "solr.core.gettingstarted.shard2.replica_n6":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0}},
  "solr.core.gettingstarted.shard1.replica_n2":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0,
  "127.0.1.1:8983_solr":{
"responseHeader":{
  "status":0,
  "QTime":6},
"metrics":{
  "solr.core.gettingstarted.shard2.replica_n4":{
"QUERY./select.distrib.requestTimes":{
  "count":0,
  "meanRate":0.0,
  "1minRate":0.0,
  "5minRate":0.0,
  "15minRate":0.0,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0}},
  "solr.core.gettingstarted.shard1.replica_n1":{
"QUERY./select.distrib.requestTimes":{
  "count":1,
  "meanRate":0.004266352892634931,
  "1minRate":0.013536188363841833,
  "5minRate":0.0031973351962583784,
  "15minRate":0.001095787094460976,
  "min_ms":383.362809,
  "max_ms":383.362809,
  "mean_ms":383.362809,
  "median_ms":383.362809,
  "stddev_ms":0.0,
  "p75_ms":383.362809,
  "p95_ms":383.362809,
  "p99_ms":383.362809,
  "p999_m

Re: becoming a solr specialist

2023-05-04 Thread Joel Bernstein

I'll ping you on LinkedIn.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, May 4, 2023 at 1:53 PM Eric Pugh 
wrote:

> Relevance Slack has an active #jobs channel as well.   The magic invite
> link is www.opensourceconnections.com/slack <
> http://www.opensourceconnections.com/slack> and then add #jobs….
>
> > On May 4, 2023, at 1:25 PM, Doug Whitfield 
> wrote:
> >
> > We are hiring Solr folks in India (Pune, specifically) and the US. We
> need people with a broad skillset, not necessarily someone who has been
> working with Solr since 2004.
> >
> > I know a bit about the progress of hiring in India, but very little in
> the US (although I know one name got passed on to the hiring manager in the
> US). If interested (and this goes for anyone on the list), get back to me
> ASAP as we are already narrowing down interview candidates in India.
> >
> > In any case, might be a good fit if the geography is right.
> >
> > From: ufuk yılmaz 
> > Date: Thursday, 4 May 2023 at 10:23
> > To: solr-user 
> > Subject: becoming a solr specialist
> > Hi all,
> >
> > First of all forgive me if asking this here is inappropriate, but I
> couldn’t think of a better place where all Solr experts gather.
> >
> > I have been working as the main “solr person” at a project since 2018
> where Solr sat at the very core of things. It’s mainly a distributed data
> (20+TB’s) analysis system where data is aggregated/analyzed  using Solr’s
> multilevel JSON faceting and streaming expressions. Search and relevancy
> was less important.
> >
> > As that project came to an end, I am looking for another position where
> I can make use of my existing knowledge and keep building on top of that,
> since switching to something unrelated to Solr feels like so much of my
> previous effort is going to be a waste.
> >
> > In this mailing list, I often see many advanced uses of Solr, so I feel
> there’s still a very long way to go and many new things to learn. But when
> searching for open positions, there are very few openings related to Solr,
> most of them just mention it with Elastic as a very complimentary tool.
> >
> > Can you point me in a direction where Solr specific experience could be
> useful? Or is it a too narrow area?
> >
> > Thanks for reading
> > Ufuk yilmaz
> > ~~
> >
> > Sent from Mail for Windows
> >
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
> >
> >
> > This e-mail may contain information that is privileged or confidential.
> If you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
> >
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Re: Edismax parsing when using pf parameter

2023-05-04 Thread Chris Hostetter


I agree this looks like a bug, would you please go ahead and file a jira?

It seems like maybe it's an off-by-one error (which is just ... ugh.)

$ curl -sS http://localhost:8983/solr/techproducts/select -d 'debug=query' 
-d 'defType=edismax' -d 'pf=text name' -d 'q=content:(XXX AND YYY AND ZZZ) 
AND (AAA AND BBB)' | grep '"parsedquery'
"parsedquery":"+(+(+content:xxx +content:yyy +content:zzz) 
+(+DisjunctionMaxQuery((text:aaa)) +DisjunctionMaxQuery((text:bbb 
DisjunctionMaxQuery((name:\"yyy zzz aaa bbb\" | text:\"yyy zzz aaa 
bbb\"))",
"parsedquery_toString":"+(+(+content:xxx +content:yyy +content:zzz) 
+(+(text:aaa) +(text:bbb))) (name:\"yyy zzz aaa bbb\" | text:\"yyy zzz aaa 
bbb\")",




: Date: Thu, 4 May 2023 17:04:50 +0200
: From: Mónica Marrero 
: Reply-To: users@solr.apache.org
: To: users@solr.apache.org
: Subject: Edismax parsing when using pf parameter
: 
: Hi,
: 
: I have found what I think is an inconsistent behaviour of the query parser
: when using the pf parameter.
: 
: I am testing with the techproducts collection in Solr v9.2.1,
: defType=edismax, pf=text name
: 
: a) Query= name:george AND (game AND thrones)
: Result= "parsedquery_toString":"+(+name:george +(+(text:game)
: +(text:thrones))) (text:\"game thrones\" | name:\"game thrones\")"
: 
: b) Query= name:(george AND martin) AND (game AND thrones)
: Result= "parsedquery_toString":"+(+(+name:george +name:martin)
: +(+(text:game) +(text:thrones))) (text:\"*martin* game thrones\" | name:\"
: *martin* game thrones\")"
: 
: I understand why in query a) pf is only applied to keywords *game *and 
*thrones
: (*keywords in the query with no explicit field assigned), but following the
: same reasoning, I would expect the same behaviour for query b) and that is
: not the case (and it works as I expected for the parameter qf). Any idea
: why this happens?
: 
: Best,
: 
: Mónica
: 
: -- 
: Disclaimer: This email and any files transmitted with it are confidential 
: and intended solely for the use of the individual or entity to whom they 
: are
: addressed. If you have received this email in error please notify the 
: system manager. If you are not the named addressee you should not 
: disseminate,
: distribute or copy this email. Please notify the sender 
: immediately by email if you have received this email by mistake and delete 
: this email from your
: system.
: 

-Hoss
http://www.lucidworks.com/

Re: Solr logs (hits value) and memory allocation

2023-05-04 Thread Joel Bernstein

It would also depend on the query.

For example collapse keeps a Map of groups heads gathered during the query.
A large result set and a high cardinality group field would result in more
memory usage.


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 3, 2023 at 3:11 PM Kevin Risden  wrote:

> Here is an example calculation of bytes -> number of entries held from the
> bitset.
>
> (2864256-12-12)/24 = 119343 long objects = 22913856 entries
>
> The above is from a cluster where each query is generating a bitset of size
> 2864256 bytes - ~2.8 MB on heap. This is for 22 million results in the
> resultset. There is some algorithmic stuff to say whether this is a spare
> bitset or a fixed bitset - over a certain size result this is always a
> fixed bitset [1]. It grows based on number of documents in the resultset
> for the shard.
>
> This is easily viewable with a profiler like async-profiler where bitsets
> are created for each query. I recently looked at this in
> https://issues.apache.org/jira/browse/SOLR-16555 where filtercache bitsets
> were being recreated over and over if there were multiple fq clauses.
> SOLR-16555 drastically reduced heap usage on the cluster I was working on
> (you can see some of the metrics on the PR from before/after)
>
> If you have a shard with 200M documents - I think that bitset could be
> ~20MB per bitset per query.
>
> [1]
>
> https://github.com/apache/solr/blame/main/solr/core/src/java/org/apache/solr/search/DocSetUtil.java#L46
>
> PS - for G1 GC almost all of these big bitsets are humongous allocations
> (due to G1 region size) which idk is a problem or not. Its something I'd
> like to look at further, but haven't had time to benchmark or look at other
> approaches.
>
> Kevin Risden
>
>
> On Wed, May 3, 2023 at 1:14 PM Vincenzo D'Amore 
> wrote:
>
> > Hi Markus,
> >
> > thanks for your explanation.
> > What if I submit a query q=*:*&rows=0 and there are 200M of documents in
> > the solr core? Will I allocate an array of ScoreDoc objects so big?
> >
> >
> >
> > On Wed, May 3, 2023 at 5:32 PM Markus Jelsma  >
> > wrote:
> >
> > > Hello Vincenzo,
> > >
> > > Yes. Last time i checked, an array of ScoreDoc objects is created for
> > each
> > > query with the size of the numFound for the local core/replica. This
> > should
> > > clearly visible in VisualVM. This happens in SolrIndexSearcher.
> > >
> > > Regards,
> > > Markus
> > >
> > > Op wo 3 mei 2023 om 17:20 schreef Vincenzo D'Amore  >:
> > >
> > > > Hi all,
> > > >
> > > > Just asking if there could be some correlation from the amount of
> > memory
> > > > allocated by a Solr query and the number of *hits* selected in solr
> > logs.
> > > > I haven't found anything in the Solr documentation.
> > > >
> > > > Do you know if there is some advice for the hits value?
> > > >
> > > > Thanks,
> > > > Vincenzo
> > > >
> > > > --
> > > > Vincenzo D'Amore
> > > >
> > >
> >
> >
> > --
> > Vincenzo D'Amore
> >
>

Re: Debug time spent in aggregating the search results

2023-05-04 Thread Rajani Maski

Solr version* 9.1.1*


Query -
solr/admin/metrics?nodes=all&wt=json&prefix=QUERY./select.distrib.requestTimes

{
  "responseHeader":{
"status":0,
"QTime":7},
  "ip:8983_solr":{
"responseHeader":{
  "status":0,
  "QTime":2},
"metrics":{}},
  "ip:8983_solr":{
"responseHeader":{
  "status":0,
  "QTime":2},
"metrics":{}},
  "ip:8983_solr":{
"responseHeader":{
  "status":0,
  "QTime":2},
"metrics":{}}}


Query without "distrib"
solr/admin/metrics?nodes=all&wt=json&prefix=QUERY./select.requestTimes


{
  "responseHeader":{
"status":0,
"QTime":10},
  "10.146.38.84:8983_solr":{
"responseHeader":{
  "status":0,
  "QTime":5},
"metrics":{
  "solr.core.test.shard1.replica_n1":{
"QUERY./select.requestTimes":{
  "count":403,
  "meanRate":4.906764808325113E-4,
  "1minRate":2.964393875E-314,
  "5minRate":1.4821969375E-313,
  "15minRate":4.44659081257E-313,
  "min_ms":0.0,
  "max_ms":0.0,
  "mean_ms":0.0,
  "median_ms":0.0,
  "stddev_ms":0.0,
  "p75_ms":0.0,
  "p95_ms":0.0,
  "p99_ms":0.0,
  "p999_ms":0.0}},


On Thu, May 4, 2023 at 2:51 PM Chris Hostetter 
wrote:

>
> : Do I need to enable something to retrieve this metric?  I tried to query
> it
> :
> http://localhost:8983/solr/admin/metrics?wt=json&prefix=QUERY./select.distrib.requestTimes
> :
> : but there is no "distrib" in the response, there are only select
>
> Hrm what version of solr are you running?
>
>
> Here's 8.11 ...
>
> $ ./bin/solr -e cloud -noprompt
> ...
> $ curl -sS '
> http://localhost:8983/solr/admin/metrics?nodes=all&wt=json&prefix=QUERY./select.distrib.requestTimes
> '
> {
>   "responseHeader":{
> "status":0,
> "QTime":35},
>   "127.0.1.1:7574_solr":{
> "responseHeader":{
>   "status":0,
>   "QTime":23},
> "metrics":{
>   "solr.core.gettingstarted.shard2.replica_n6":{
> "QUERY./select.distrib.requestTimes":{
>   "count":0,
>   "meanRate":0.0,
>   "1minRate":0.0,
>   "5minRate":0.0,
>   "15minRate":0.0,
>   "min_ms":0.0,
>   "max_ms":0.0,
>   "mean_ms":0.0,
>   "median_ms":0.0,
>   "stddev_ms":0.0,
>   "p75_ms":0.0,
>   "p95_ms":0.0,
>   "p99_ms":0.0,
>   "p999_ms":0.0}},
>   "solr.core.gettingstarted.shard1.replica_n2":{
> "QUERY./select.distrib.requestTimes":{
>   "count":0,
>   "meanRate":0.0,
>   "1minRate":0.0,
>   "5minRate":0.0,
>   "15minRate":0.0,
>   "min_ms":0.0,
>   "max_ms":0.0,
>   "mean_ms":0.0,
>   "median_ms":0.0,
>   "stddev_ms":0.0,
>   "p75_ms":0.0,
>   "p95_ms":0.0,
>   "p99_ms":0.0,
>   "p999_ms":0.0,
>   "127.0.1.1:8983_solr":{
> "responseHeader":{
>   "status":0,
>   "QTime":2},
> "metrics":{
>   "solr.core.gettingstarted.shard2.replica_n4":{
> "QUERY./select.distrib.requestTimes":{
>   "count":0,
>   "meanRate":0.0,
>   "1minRate":0.0,
>   "5minRate":0.0,
>   "15minRate":0.0,
>   "min_ms":0.0,
>   "max_ms":0.0,
>   "mean_ms":0.0,
>   "median_ms":0.0,
>   "stddev_ms":0.0,
>   "p75_ms":0.0,
>   "p95_ms":0.0,
>   "p99_ms":0.0,
>   "p999_ms":0.0}},
>   "solr.core.gettingstarted.shard1.replica_n1":{
> "QUERY./select.distrib.requestTimes":{
>   "count":0,
>   "meanRate":0.0,
>   "1minRate":0.0,
>   "5minRate":0.0,
>   "15minRate":0.0,
>   "min_ms":0.0,
>   "max_ms":0.0,
>   "mean_ms":0.0,
>   "median_ms":0.0,
>   "stddev_ms":0.0,
>   "p75_ms":0.0,
>   "p95_ms":0.0,
>   "p99_ms":0.0,
>   "p999_ms":0.0}
> $ curl -sS '
> http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=*:*'
> {
>
>
> "response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]
>   }}
> $ curl -sS '
> http://localhost:8983/solr/admin/metrics?nodes=all&wt=json&prefix=QUERY./select.distrib.requestTimes
> '
> {
>   "responseHeader":{
> "status":0,
> "QTime":25},
>   "127.0.1.1:7574_solr":{
> "responseHeader":{
>   "status":0,
>   "QTime":8},
> "metrics":{
>   "solr.core.gettingstarted.shard2.replica_n6":{
> "QUERY./select.distrib.requestTimes":{
>   "count":0,
>   "meanRate":0.0,
>   "1minRate":0.0,
>   "5minRate":0.0,
>   "15minRate":0.0,
>   "min_ms":0.0,
>   "max_ms":0.0,
>   "mean_ms":0.0,
>   "median_ms":0.0,
>   "stddev_ms":0.0,
>   "p75_ms":0.0,
>   "p95_ms":0.0,
>   "p99_ms":0.0,
>   "p999_ms":0.0}},
>   "solr.core.gettingstarted.shard1.repl

Re: Backing up Solr to a specific path

2023-05-04 Thread Lewis Blackwell

Hi everyone,
Is there a way to set the path when backing up my solr core outside the
solr directory? I tried updating the solr.xml file with a specific path but
it was not recognized. I am using solr 9.0

Thanks,
Lewis

Re: facet domain change with blockChildren not working as expected

2023-05-04 Thread Igor Blanco

O great, thanks for the tip.

And yes, jejeje, blockChain in SOLR would be great for all that folks
that are frenetically searching for their wallet's lost passwords. :P

Thanks for your help.

El 4/5/23 a las 19:17, Mikhail Khludnev escribió:

Is there a way to reduce the new domain to only the

children's that comply to some kind of filter?

Sure. You can apply "filter" under "domain" to restrict a certain child
type. Check
https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#adding-domain-filters
please.

PS. we should definitely add blockChain to Solr to modernize it. Thanks
for the clue!

On Thu, May 4, 2023 at 5:58 PM Igor Blanco
wrote:

Hi Mikhail,

Thanks for your response. It worked and I think that it put me on the
right path but I'm still a bit confused

I thought that blockChain allowed me to change the domain to all the
children of the main resulting documents and that the "id:DIC*" filter
would limit the children in the domain to those whose id starts with
DIC, which are only the dictionary ones.

But after trying your suggestion, rereading the doc and thanks to your
pointer I start to understand that what I have to provide to
blockChildren is the query to get the parent documents. I've come with
something like this:

"blockChildren":"-_nest_path_:*"

It seems to work.

But in my case this works because I only have "lang_ids" field in the
dictionary subdocuments... but what if this field was also present in
other subdocuments? Is there a way to reduce the new domain to only the
children's that comply to some kind of filter?

Thanks a lot.

El 4/5/23 a las 10:58, Mikhail Khludnev escribió:

Hello Igor.
I'm not sure parent/child docs are indexed well in this particular case.
But I spot one detail in the ref guide ... exclusively matches all

parent

documents in the collection.
Presumably it should be "blockChildren":"id:[0 TO 9]"
Beforehand, check that this query matches only parent documents.

On Thu, May 4, 2023 at 10:14 AM Igor Blanco
wrote:

I have a document with nested documents indexed as this:

{

'id':1,
'creation_date':'2023-02-09T07:18:59Z',
'update_date':'2023-05-03T14:37:08Z',
'dictionary':{
'id':'DIC1',
'lang_ids':[
2,
3
]
},
'contexts':[
],
'definitions':[
{
'id':'DFN1',
'lang_id':2,
'definition':'una definición'
}
],
'denominations':[
{
'id':'DNM1',
'lang_id':2,
'denomination':'Casa',
'feminine_form':'no procede',
'masculine_form':'no procede'
}
],
'illustrations':[
],
'notes':[
],
'observations':[
],
'videos':[
]

}

I want to find that document and a facet that lists each of the lang_ids
in the dictionary subdocument.

So I try a query like this:

http://0.0.0.0:8983/solr/index_cards/select?facet=true&indent=true&json.facet=%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22working_language_ids%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22domain%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22blockChildren%22%3A%22id%3ADIC*%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22type%22%3A%20%22terms%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22lang_ids%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22limit%22%3A%20-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D&q.op=OR&q=id%3A1&useParams=

q parameter is => id:1

json.facet parameter is =>

{
"working_language_ids": {
"domain": {
"blockChildren":"id:DIC*"
},
"type": "terms",
"field": "lang_ids",
"limit": -1
}
}

The result returns the expected parent document, but
working_language_ids facet is empty:

"facets":{
"count":1,
"working_language_ids":{
"buckets":[]}}

If i query directly "id:DIC*" and get the dictionary subdocuments
instead of the parent and ommit the "domain" section of the facet it
returns the expected result, so the problem does not seem to be in the
in the indexing of lang_ids itself nor in the facet definition, but most
probably in the use of "blockChildren".

Any clue will be much appreciated, thanks.

IgorBlanco

Director desarrollo a medida | Neurrirako garapenen zuzendaria

Binovo IT Human Project

943 569 206 | 690229375

ibla

RE: Backing up Solr to a specific path

2023-05-04 Thread DAVID MARTIN NIETO


De: Lewis Blackwell 
Enviado: viernes, 5 de mayo de 2023 3:55
Para: users@solr.apache.org 
Asunto: Re: Backing up Solr to a specific path

Hi everyone,
Is there a way to set the path when backing up my solr core outside the
solr directory? I tried updating the solr.xml file with a specific path but
it was not recognized. I am using solr 9.0

Thanks,
Lewis

facet domain change with blockChildren not working as expected

Re: facet domain change with blockChildren not working as expected

Re: standard tokenizer seemingly splitting on dot

Skip applying boost queries

Re: Debug time spent in aggregating the search results

Re: facet domain change with blockChildren not working as expected

Edismax parsing when using pf parameter

becoming a solr specialist

Re: Skip applying boost queries

Re: becoming a solr specialist

Re: Skip applying boost queries

Re: facet domain change with blockChildren not working as expected

Re: becoming a solr specialist

Re: becoming a solr specialist

Vector as LTR Field Value Feature Type

Re: standard tokenizer seemingly splitting on dot

Re: standard tokenizer seemingly splitting on dot

Re: Debug time spent in aggregating the search results

Re: becoming a solr specialist

Re: Edismax parsing when using pf parameter

Re: Solr logs (hits value) and memory allocation

Re: Debug time spent in aggregating the search results

Re: Backing up Solr to a specific path

Re: facet domain change with blockChildren not working as expected

RE: Backing up Solr to a specific path

25 matches

Site Navigation

Mail list logo

Footer information