Re: Random Field - # digits

2021-08-31 Thread rgamarra
hi,

> Random ≠ unique.

Agree. They are not the same. I don't want a tie breaker, I want to know
how many ties I would face.

The implementation where it's being used has some other (posterior) sorting
criteria. So the question can be rephrased as whether posterior orders have
any effect or not.

For example, given

sort= random_1234 DESC, price DESC

At the end of the day, does the "price DESC" have any effect (which
translates to how often ties in the random do happen)?

I took a glimpse at
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/RandomSortField.java
and I conclude that
- an int is being used.
- it's a hashing of the #doc + see, more than a random number generator of
a certain distribution.

Best. Thanks.


--
Rodolfo Federico Gamarra


On Tue, Aug 31, 2021 at 3:00 AM Thomas Corthals 
wrote:

> Hi Rodolfo
>
> Random ≠ unique. If you really need a tie breaker, you'll have to sort on
> the uiqueKey field.
>
> What is your use case here? When using a cursor, sorting on a random field
> will yield confusing results.
>
> Thomas
>
> Op ma 30 aug. 2021 om 17:33 schreef rgamarra :
>
> > Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
> > breaker.
> >
> > I'm wondering the underlying random sequence how many digits uses for
> each
> > generated number.
> >
> > My result sets my contain (in principle) millions of results, so I would
> > like to have an estimation of possible clashes (ie two results ending
> with
> > the same random under, and then being a tie in the result set).
> >
> > Best regards.
> >
> > --
> > Rodolfo Federico Gamarra
> >
>


Re: Random Field - # digits

2021-08-31 Thread Andrew Hankinson
You could use the UUIDUpdateProcessorFactory to automatically add a UUID to 
each document and use that as the tie-breaker field.

https://solr.apache.org/guide/8_1/update-request-processors.html#uuidupdateprocessorfactory

The chances of collision of UUIDs is well-known, and highly unlikely.

https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions



> On 31 Aug 2021, at 14:04, rgamarra  wrote:
> 
> hi,
> 
>> Random ≠ unique.
> 
> Agree. They are not the same. I don't want a tie breaker, I want to know
> how many ties I would face.
> 
> The implementation where it's being used has some other (posterior) sorting
> criteria. So the question can be rephrased as whether posterior orders have
> any effect or not.
> 
> For example, given
> 
> sort= random_1234 DESC, price DESC
> 
> At the end of the day, does the "price DESC" have any effect (which
> translates to how often ties in the random do happen)?
> 
> I took a glimpse at
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/RandomSortField.java
> and I conclude that
> - an int is being used.
> - it's a hashing of the #doc + see, more than a random number generator of
> a certain distribution.
> 
> Best. Thanks.
> 
> 
> --
> Rodolfo Federico Gamarra
> 
> 
> On Tue, Aug 31, 2021 at 3:00 AM Thomas Corthals 
> wrote:
> 
>> Hi Rodolfo
>> 
>> Random ≠ unique. If you really need a tie breaker, you'll have to sort on
>> the uiqueKey field.
>> 
>> What is your use case here? When using a cursor, sorting on a random field
>> will yield confusing results.
>> 
>> Thomas
>> 
>> Op ma 30 aug. 2021 om 17:33 schreef rgamarra :
>> 
>>> Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
>>> breaker.
>>> 
>>> I'm wondering the underlying random sequence how many digits uses for
>> each
>>> generated number.
>>> 
>>> My result sets my contain (in principle) millions of results, so I would
>>> like to have an estimation of possible clashes (ie two results ending
>> with
>>> the same random under, and then being a tie in the result set).
>>> 
>>> Best regards.
>>> 
>>> --
>>> Rodolfo Federico Gamarra
>>> 
>> 



Re: Random Field - # digits

2021-08-31 Thread rgamarra
Thanks. It's not what I need, but would have it mind.

Sorry, my statement was not clear. I have already replied  to Thomas with
further details.

thanks you all

rodolfo

On Tue, Aug 31, 2021, 9:08 AM Andrew Hankinson
 wrote:

> You could use the UUIDUpdateProcessorFactory to automatically add a UUID
> to each document and use that as the tie-breaker field.
>
>
> https://solr.apache.org/guide/8_1/update-request-processors.html#uuidupdateprocessorfactory
>
> The chances of collision of UUIDs is well-known, and highly unlikely.
>
> https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
>
>
>
> > On 31 Aug 2021, at 14:04, rgamarra  wrote:
> >
> > hi,
> >
> >> Random ≠ unique.
> >
> > Agree. They are not the same. I don't want a tie breaker, I want to know
> > how many ties I would face.
> >
> > The implementation where it's being used has some other (posterior)
> sorting
> > criteria. So the question can be rephrased as whether posterior orders
> have
> > any effect or not.
> >
> > For example, given
> >
> > sort= random_1234 DESC, price DESC
> >
> > At the end of the day, does the "price DESC" have any effect (which
> > translates to how often ties in the random do happen)?
> >
> > I took a glimpse at
> >
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/RandomSortField.java
> > and I conclude that
> > - an int is being used.
> > - it's a hashing of the #doc + see, more than a random number generator
> of
> > a certain distribution.
> >
> > Best. Thanks.
> >
> >
> > --
> > Rodolfo Federico Gamarra
> >
> >
> > On Tue, Aug 31, 2021 at 3:00 AM Thomas Corthals 
> > wrote:
> >
> >> Hi Rodolfo
> >>
> >> Random ≠ unique. If you really need a tie breaker, you'll have to sort
> on
> >> the uiqueKey field.
> >>
> >> What is your use case here? When using a cursor, sorting on a random
> field
> >> will yield confusing results.
> >>
> >> Thomas
> >>
> >> Op ma 30 aug. 2021 om 17:33 schreef rgamarra :
> >>
> >>> Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
> >>> breaker.
> >>>
> >>> I'm wondering the underlying random sequence how many digits uses for
> >> each
> >>> generated number.
> >>>
> >>> My result sets my contain (in principle) millions of results, so I
> would
> >>> like to have an estimation of possible clashes (ie two results ending
> >> with
> >>> the same random under, and then being a tie in the result set).
> >>>
> >>> Best regards.
> >>>
> >>> --
> >>> Rodolfo Federico Gamarra
> >>>
> >>
>
>


"The request took too long" cause exceptions in upgrade testing

2021-08-31 Thread Dominic Humphries
I'm trying to upgrade from 8.3.1 to 8.8.1, and we're seeing slower
performance and higher rate of failed requests when testing the upgrade.

The main culprit seems to be when we're burdening the service enough to
start causing "The request took too long to iterate over doc/point values"
warnings. On 8.3.1 these warnings are just that and no more; but on 8.8.1
the warning gets a full stack trace in the logs (See below). AIUI (my
Java's not great) this indicates an uncaught exception that could easily
explain the performance and request failures. But I don't understand why
we're getting such an exception for 8.8.1 when we don't for 8.3.1. Is there
anything I can do about this?

Thanks
Dom

2021-08-31 14:02:37.967 WARN  (qtp1037163664-65) [   x:jobs_UK]
o.a.s.h.c.SearchHandler Query:   =>
org.apache.lucene.index.ExitableDirectoryReader$ExitingReaderException: The
request took too long to iterate over doc values. Timeout: timeoutAt:
25316267453971 (System.nanoTime(): 25317582929106),
DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@18ff1305
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableFilterAtomicReader.checkAndThrow(ExitableDirectoryReader.java:319)
org.apache.lucene.index.ExitableDirectoryReader$ExitingReaderException: The
request took too long to iterate over doc values. Timeout: timeoutAt:
25316267453971 (System.nanoTime(): 25317582929106),
DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@18ff1305

at
org.apache.lucene.index.ExitableDirectoryReader$ExitableFilterAtomicReader.checkAndThrow(ExitableDirectoryReader.java:319)
~[?:?]
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableFilterAtomicReader.access$100(ExitableDirectoryReader.java:73)
~[?:?]
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableFilterAtomicReader$1.advance(ExitableDirectoryReader.java:127)
~[?:?]
at
org.apache.lucene.queries.function.valuesource.FloatFieldSource$1.getValueForDoc(FloatFieldSource.java:67)
~[?:?]
at
org.apache.lucene.queries.function.valuesource.FloatFieldSource$1.exists(FloatFieldSource.java:83)
~[?:?]
at
org.apache.solr.handler.component.NumericStatsValues.accumulate(StatsValuesFactory.java:484)
~[?:?]
at
org.apache.solr.handler.component.StatsField.computeLocalValueSourceStats(StatsField.java:472)
~[?:?]
at
org.apache.solr.handler.component.StatsField.computeLocalStatsValues(StatsField.java:431)
~[?:?]
at
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:60)
~[?:?]
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:355)
~[?:?]
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
~[?:?]
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2646) ~[?:?]
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794) ~[?:?]
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)
~[?:?]
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
~[?:?]
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
~[?:?]
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
~[jetty-servlet-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
~[jetty-servlet-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
~[jetty-servlet-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
~[jetty-security-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1612)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
~[jetty-server-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
~[jetty-servlet-9.4.34.v20201102.jar:9.4.34.v20201102]
at
org.eclipse.jetty.serv

solr cloud vers 7.6.0 Requested node XXXXX:8090_solr is not part of the cluster

2021-08-31 Thread Jeff Courtade
Hi,

I have a large solr deployment with solr 7.6.0

128 servers 5 zookeepers on separate systems

each solr host runs 2 solr instances one a primary shard one a secondary shard
port 8080 is always primary aka master replica
port 8090 is always a secondary replica

These are all NRT replicas

so to add insult to injury we never start the secondary replicas
except by accident as the servers cannot handle the load of both solrs
running at the same time.

so we have 128 primary shard replicas with a secondary replica that is
down all the time

we are seeing this in our cloud console under nodes

Requested node XXX:8090_solr is not part of the cluster

So the primary replica for that shard is fine and the 8090 replica
shows up in the console as down like all the other ones on port 8080.

the solr cluster is answering all queries.

does this make any difference at all to the cluster being able to
serve requests?

how could we jumpstart that node so this error goes away?


Having problem with implicit field "_root_"

2021-08-31 Thread Ed Yu
We have a solr setup from a very old version (1.4) and we are upgrading it to 
8.9. We are stuck at the point that I think the schema.xml is syntactically 
free of errors but now giving an error:

Caused by: org.apache.solr.common.SolrException: Could not load conf for core 
nwr_col: Can't load schema /var/solr/data/nwr_col/conf/schema.xml: _root_ field 
must be defined using the exact same fieldType as the uniqueKey field (id) 
uses: uuid

We have a field “id”
   
   

So I added:

   

And now I got the following error:

Caused by: org.apache.solr.common.SolrException: Could not load conf for core 
nwr_col: Can't load schema /var/solr/data/nwr_col/conf/schema.xml: [schema.xml] 
Duplicate field definition for '_root_' 
[[[_root_{type=string,properties=indexed,omitNorms,omitTermFreqAndPositions,sortMissingLast,docValues,useDocValuesAsStored,uninvertible}]]]
 and [[[_root_{type=uuid,default=NEW,properties=useDocValuesAsStored}]]]

Sounds like redefining the _root_ field is not allowed.

So I need to know:


  1.  How can we redefine _root_ to fix the above error?
  2.  Preferably, is there a way we can disable this parent child document 
feature to avoid the need of the _root_ field?

Sorry for such a noob question.

Regards,
Ed.

Sent from Mail for Windows



Re: Having problem with implicit field "_root_"

2021-08-31 Thread Srijan
Check the attached thread. It might be what you're looking for.

On Tue, Aug 31, 2021 at 4:30 PM Ed Yu  wrote:

> We have a solr setup from a very old version (1.4) and we are upgrading it
> to 8.9. We are stuck at the point that I think the schema.xml is
> syntactically free of errors but now giving an error:
>
> Caused by: org.apache.solr.common.SolrException: Could not load conf for
> core nwr_col: Can't load schema /var/solr/data/nwr_col/conf/schema.xml:
> _root_ field must be defined using the exact same fieldType as the
> uniqueKey field (id) uses: uuid
>
> We have a field “id”
> default="NEW"/>
>
>
> So I added:
>
> docValues="false" default="NEW" />
>
> And now I got the following error:
>
> Caused by: org.apache.solr.common.SolrException: Could not load conf for
> core nwr_col: Can't load schema /var/solr/data/nwr_col/conf/schema.xml:
> [schema.xml] Duplicate field definition for '_root_'
> [[[_root_{type=string,properties=indexed,omitNorms,omitTermFreqAndPositions,sortMissingLast,docValues,useDocValuesAsStored,uninvertible}]]]
> and [[[_root_{type=uuid,default=NEW,properties=useDocValuesAsStored}]]]
>
> Sounds like redefining the _root_ field is not allowed.
>
> So I need to know:
>
>
>   1.  How can we redefine _root_ to fix the above error?
>   2.  Preferably, is there a way we can disable this parent child document
> feature to avoid the need of the _root_ field?
>
> Sorry for such a noob question.
>
> Regards,
> Ed.
>
> Sent from Mail for Windows
>
>
--- Begin Message ---
The follow-up here from JIRA is that, as of Solr 8.0, you must not add "_root_" 
to a schema for an existing collection.  Solr uses this field instead of the 
uniqueKey for certain identity checks.  Chaos will ensue if you add it later.  
I shall update the ref guide to add a warning.

~ David

On 2021/06/08 15:14:18, Andreas Hubold  wrote: 
> Hi,
> 
> with Solr 8.6.3 we developed a new feature that uses partial update to 
> add some nested documents to existing index documents.
> 
> Because we didn't have nested documents so far, we've added the _root_ 
> and _nest_path_ fields to the schema, but of course these were unset for 
> existing documents.
> 
>  docValues="true" />
>  name="_nest_path_" class="solr.NestPathField" />
> 
> With 8.6.3 it worked fine to use partial updates to set some nested 
> documents to existing docs. Nested documents itself were never changed 
> here, we're just setting the nested documents for existing top-level 
> documents.
> 
> I could also see that the _root_ field was correctly updated for both 
> root and child documents.
> 
> Now we've updated to Solr 8.8.2 and still want to use old indices where 
> the _root_ field isn't set for all documents. But now adding nested 
> documents doesn't work anymore:
> 
> Caused by: org.apache.solr.common.SolrException: Attempted an 
> atomic/partial update to a child doc without indicating the _root_ somehow.
>      at 
> org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:746)
>      at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:689)
>      at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:373)
>      at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:336)
>      at 
> org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
>      at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:336)
>      at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:222)
>      at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> 
> This check was introduced with 
> https://issues.apache.org/jira/browse/SOLR-14923
> 
> I know, I could reindex everything, but I'd really really like to avoid 
> this.
> Is there some other kind of workaround that I could use with Solr 8.8.2?
> 
> Or would it be possible to change the check, so that it only throws an 
> exception if there's an existing(!) _root_ value in the indexed document 
> that doesn't match?
> 
> Thanks,
> Andreas
> 
> 
--- End Message ---


Re: Having problem with implicit field "_root_"

2021-08-31 Thread Dominique Bejean
Hi,

I suggest you define both id and _root_ fields as string, and you populate
id with uuid generated by UUIDUpdateProcessor in an
updateRequestProcessorChain.

See Solr Wiki - https://cwiki.apache.org/confluence/display/solr/uniquekey

Dominique



Le mar. 31 août 2021 à 22:30, Ed Yu  a écrit :

> We have a solr setup from a very old version (1.4) and we are upgrading it
> to 8.9. We are stuck at the point that I think the schema.xml is
> syntactically free of errors but now giving an error:
>
> Caused by: org.apache.solr.common.SolrException: Could not load conf for
> core nwr_col: Can't load schema /var/solr/data/nwr_col/conf/schema.xml:
> _root_ field must be defined using the exact same fieldType as the
> uniqueKey field (id) uses: uuid
>
> We have a field “id”
> default="NEW"/>
>
>
> So I added:
>
> docValues="false" default="NEW" />
>
> And now I got the following error:
>
> Caused by: org.apache.solr.common.SolrException: Could not load conf for
> core nwr_col: Can't load schema /var/solr/data/nwr_col/conf/schema.xml:
> [schema.xml] Duplicate field definition for '_root_'
> [[[_root_{type=string,properties=indexed,omitNorms,omitTermFreqAndPositions,sortMissingLast,docValues,useDocValuesAsStored,uninvertible}]]]
> and [[[_root_{type=uuid,default=NEW,properties=useDocValuesAsStored}]]]
>
> Sounds like redefining the _root_ field is not allowed.
>
> So I need to know:
>
>
>   1.  How can we redefine _root_ to fix the above error?
>   2.  Preferably, is there a way we can disable this parent child document
> feature to avoid the need of the _root_ field?
>
> Sorry for such a noob question.
>
> Regards,
> Ed.
>
> Sent from Mail for Windows
>
>