Indexing Files in the Filelist

2021-07-26 Thread Patricia Schierling (zeroseven design studios)
Hi everyone, 

we are not using a PDF Indexing extension like Tikka but we would like to index 
the file names in the file list (not the alt tags, really only the file name). 
Is Solr able to do this by default? Can you let me know which configuration we 
would need?

-- 

Viele Grüße aus Ulm | kind regards

Patricia Schierling 

Tel.: +49 731 715732-164
E-Mail: p.schierl...@zeroseven.de

- : - - - - - - - - - - - - - - - - - - - - - - - - : -

zeroseven design studios GmbH
Frauenstraße 83
89073 Ulm

Tel.  +49 7 31 71 57 32 - 100
Fax  +49 7 31 71 57 32 - 290

www.zeroseven.de
blog.zeroseven.de
xing.com/pages/zerosevendesignstudios
linkedin.com/company/zeroseven-design-studios
instagram.com/zeroseven_design_studios
pinterest.de/zeroseven
facebook.com/zeroseven.design.studios

Handelsregister Ulm HRB 41 44
Geschäftsführer: Thomas Seruset, Sebastian Feurle

- : - - - - - - - - - - - - - - - - - - - - - - - - : -

Diese elektronisch übermittelte Post ist nur für den aufgeführten Empfänger 
bestimmt. Im Falle einer irrtümlichen Übermittlung bitten wir um Rücksendung an 
den Absender. Wir weisen darauf hin, dass diese Sendung einschliesslich aller 
Anlagen insbesondere Dateianlagen unser Eigentum ist und ohne unsere 
schriftliche Zustimmung weder abgeändert, kopiert noch dritten Personen 
zugänglich gemacht werden darf.

This e-mail is private and confidential and should only be read by those to 
whom it is addressed. If you are not the intended recipient, please return this 
e-mail to the sender. We notify that any dissemination, distribution, copy, 
reproduction, modification or publication of this communication is strictly 
prohibited. This message is not intended to be relied upon by any person 
without subsequent written confirmation of its contents.


Changing the solr url

2021-07-26 Thread Endika Posadas
Hi,

I have a Solr cloud deployment with multiple shards and replicas. After a 
while, the url where the solr instances are deployed has changed, so every solr 
node is down

I would like to update the url of the solr nodes to the updated one so I can 
bring the cluster back up. After reading the documentation, I coudn't find an 
API to do so, is there a way to update the cluster URLs without having to 
manually update Zookeeper?

Thanks


Re: Changing the solr url

2021-07-26 Thread Shawn Heisey

On 7/26/2021 5:19 AM, Endika Posadas wrote:

I have a Solr cloud deployment with multiple shards and replicas. After a 
while, the url where the solr instances are deployed has changed, so every solr 
node is down

I would like to update the url of the solr nodes to the updated one so I can 
bring the cluster back up. After reading the documentation, I coudn't find an 
API to do so, is there a way to update the cluster URLs without having to 
manually update Zookeeper?


This sounds weird.  How have the URLs changed?  I'm not aware of any 
situation where the URL can change unless someone does something that 
changes the deployment, or something is done outside of Solr.


I have NEVER heard of SolrCloud URLs changing without outside 
influence.  There is the work on time based collections, but even that 
should give you a stable URL to access the data.


Thanks,
Shawn



Re: Indexing Files in the Filelist

2021-07-26 Thread Dave
This is again one of situations where you just need to code it in your indexer, 
which is independent. Any language can do this you just need a specified field 
or just a dynamic one for this purpose and just put it in. 

> On Jul 26, 2021, at 7:09 AM, Patricia Schierling (zeroseven design studios) 
>  wrote:
> 
> Hi everyone, 
> 
> we are not using a PDF Indexing extension like Tikka but we would like to 
> index the file names in the file list (not the alt tags, really only the file 
> name). Is Solr able to do this by default? Can you let me know which 
> configuration we would need?
> 
> -- 
> 
> Viele Grüße aus Ulm | kind regards
> 
> Patricia Schierling 
> 
> Tel.: +49 731 715732-164
> E-Mail: p.schierl...@zeroseven.de
> 
> - : - - - - - - - - - - - - - - - - - - - - - - - - : -
> 
> zeroseven design studios GmbH
> Frauenstraße 83
> 89073 Ulm
> 
> Tel.  +49 7 31 71 57 32 - 100
> Fax  +49 7 31 71 57 32 - 290
> 
> www.zeroseven.de
> blog.zeroseven.de
> xing.com/pages/zerosevendesignstudios
> linkedin.com/company/zeroseven-design-studios
> instagram.com/zeroseven_design_studios
> pinterest.de/zeroseven
> facebook.com/zeroseven.design.studios
> 
> Handelsregister Ulm HRB 41 44
> Geschäftsführer: Thomas Seruset, Sebastian Feurle
> 
> - : - - - - - - - - - - - - - - - - - - - - - - - - : -
> 
> Diese elektronisch übermittelte Post ist nur für den aufgeführten Empfänger 
> bestimmt. Im Falle einer irrtümlichen Übermittlung bitten wir um Rücksendung 
> an den Absender. Wir weisen darauf hin, dass diese Sendung einschliesslich 
> aller Anlagen insbesondere Dateianlagen unser Eigentum ist und ohne unsere 
> schriftliche Zustimmung weder abgeändert, kopiert noch dritten Personen 
> zugänglich gemacht werden darf.
> 
> This e-mail is private and confidential and should only be read by those to 
> whom it is addressed. If you are not the intended recipient, please return 
> this e-mail to the sender. We notify that any dissemination, distribution, 
> copy, reproduction, modification or publication of this communication is 
> strictly prohibited. This message is not intended to be relied upon by any 
> person without subsequent written confirmation of its contents.


Re: The filter cache is not reclaimed

2021-07-26 Thread Alessandro Benedetti
hi Dawn,
from your config, any time you open a new searcher you are
auto-warming the *most
recent 100 keys* of your cache (while losing the other entries).
How often do you open a searcher ? (soft commit or hard commit with
openSearcher='true' ?)

>From your comment "found that the Filter cache can not be released" -> when
do you expect Solr to release the memory associated with the filter cache?

A Caffeine cache removes rows from the cache if they are not used for
 maxIdleTimeSec="600"
"Specifies that each entry should be automatically removed from the cache
once a fixed duration has elapsed after the entry's creation, the most
recent replacement of its value, or its last access. Access time is reset
by all cache read and write
operations" 
com.github.benmanes.caffeine.cache.Caffeine#expireAfterAccess(java.time.Duration)

Cheers
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Fri, 23 Jul 2021 at 03:18, Dawn  wrote:

> Hi:
> solr 8.7.0
> My online service, memory continues to grow, dump memory found that the
> Filter cache can not be released, has been occupying memory.
>
> The filter cache class is CaffeineCache I tried to adjust the GC
> policy and Filter cache parameter(maxRamMB maxIdleTimeSec cleanupThread
> autowarmCount), but it doesn't solve the problem.
>
> Are there any other parameters about the cache that can be
> adjusted?
>
>
>
>   maxRamMB="200"
>  maxIdleTimeSec="600"
>  cleanupThread="true"
>  autowarmCount="100”/>
>
>
>
>
>
>
>


Re: Result set order when searching on "*" (asterisk character)

2021-07-26 Thread Alessandro Benedetti
Hi,
to add to what Michael specified:



*"is determined by the order of docs asserialized in the Lucene index --
and that order is arbitrary, and can varyacross different replicas of the
same "shard" of the index."*

Until segment merge happens in the background, the internal Lucene ID for a
document aligns with the order of indexing.
So you may be tricked to rely on this property i.e. the final tie-breaker
is the time of indexing.
But updates, deletions, background merges(and potentially other internal
mechanisms that I missed) happen all the time, so you should always be
responsible for solving your score ties and never rely on the internal
Lucene ID for a document to do that.

Cheers
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Thu, 22 Jul 2021 at 17:35, Michael Gibney 
wrote:

> No sort option configured generally defaults to score (and currently does
> so even in cases such as the "*:*" case (MatchAllDocsQuery) where sort is
> guaranteed to be irrelevant; see:
> https://issues.apache.org/jira/browse/SOLR-14765).
>
> But functionally speaking that doesn't really matter: in the event of a
> main-sort "tie" (and in this case what you have is essentially "one big
> tie") or no sort at all, the order is determined by the order of docs as
> serialized in the Lucene index -- and that order is arbitrary, and can vary
> across different replicas of the same "shard" of the index.
>
> If stability is desired (and in many cases it is), you could try adding a
> default `sort` param of, e.g.: `sort=score,id` (with `id` as a unique,
> explicit tie-breaker). There are other options for handling this situation
> and nuances that you may want to account for somehow; but they all stem
> from the direct answer to your question, which is that in the event of tie
> or no sort, the order of returned results is arbitrary and unstable.
>
> On Thu, Jul 22, 2021 at 11:11 AM Steven White 
> wrote:
>
> > I don't have any sort option configured.  The score I'm getting back is
> 1.0
> > for each hit item.
> >
> > Does anyone know about Lucene's internal functionality to help me
> > understand what the returned order is?
> >
> > Steven
> >
> > On Wed, Jul 21, 2021 at 10:52 AM Vincenzo D'Amore 
> > wrote:
> >
> > > if no sort options are configure, just try to add the score field
> you'll
> > > see all the documents (are ordered by score), which usually when there
> > are
> > > no clause is 1.
> > >
> > > On Wed, Jul 21, 2021 at 4:36 PM Steven White 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > When I search on "*" (asterisk character) what's the result sort
> order
> > > > based on?
> > > >
> > > > Thanks
> > > >
> > > > Steven
> > > >
> > >
> > >
> > > --
> > > Vincenzo D'Amore
> > >
> >
>


Re: min_popularity alternative for Solr Relatedness and Semantic Knowledge Graphs

2021-07-26 Thread Alessandro Benedetti
Hi Kerwin,
I was taking a look to your question and the
*org.apache.solr.search.facet.RelatednessAgg* code, in line :
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Thu, 22 Jul 2021 at 08:27, Kerwin  wrote:

> Hi Solr users,
>
> I have a question on the relatedness and Semantic Knowledge Graphs feature
> in Solr.
> While the results are good with the out of box provision, I need some
> tweaking on the ability to specify filters or parameters based on only the
> foreground count. Right now only the min_popularity parameter is available
> which applies to both the foreground dataset or the background one.

so far so good

> The
> white paper from Trey Grainger and his team mention that the z score is
> used to calculate the score. As per my understanding, the z score assumes a
> normal distribution and is applicable when sample size>30 which I assume is
> the foreground count.

I don't have time right now to go through the paper, but the only place I
found the '30' magic number in the class is within this
method: org.apache.solr.search.facet.RelatednessAgg#computeRelatedness
It's not even defined as a constant nor a variable driven by a param so
it's not possible to change it unless we improve the code.

> So I would like to control this value with a
> parameter or filter. Right now I am getting the approximate count by doing
> a reverse calculation on the foreground popularity and the background size
> to get the foreground count. Kindly correct me if my understanding is
> different from what it should be.
>
What I recommend is to take a look at the code references I put, and write
a contribution on your own to add the additional configuration with the
explanation.
As a committer, I would be happy to review such work and merge it in if it
improves the relatedness aggregation (we could take the occasion to also
rename some of the variables, which seem to not align with java standard
'min_pop' => minPopularity, ect ect
Cheers


Re: MultipleAdditiveTreeModel

2021-07-26 Thread Alessandro Benedetti
I didn't get any additional notification (or maybe I missed it).
Has the Jira been created yet?
Boolean features are quite common around Learning To Rank use cases.
I do believe this contribution can be useful.
If you don't have time to create the Jira or contribute the pull request,
no worries, just let us know and we (committers) will organize to do it.
Thanks for your help. without the effort of our users, Apache Solr wouldn't
be the same.
Cheers
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Fri, 16 Jul 2021 at 20:29, Roopa Rao  wrote:

> Spyros, thank you for verifying this, we are planning to do something
> similar.
>
> Thanks,
> Roopa
>
> On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis 
> wrote:
>
> > Hello,
> >
> > Just to verify this, we had come across the exact same issue when
> > converting an XGBoost model to MUltipleAdditiveTrees. This was an issue
> > specifically with the categorical features that take on integer values.
> We
> > ended up subtracting 0.5 from the threshold value on any such split point
> > on the converted model, so that it would output the same score as the
> input
> > model.
> >
> > On Fri, Jul 16, 2021, 18:19 Roopa Rao  wrote:
> >
> > > Okay, thank you for the input
> > >
> > > Roopa
> > >
> > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > a.benede...@sease.io
> > > >
> > > wrote:
> > >
> > > > Hi Roopa,
> > > > I was not able to find why that slack was added.
> > > > I am not sure why we would like to change the threshold.
> > > > I would recommend creating a Jira issue and tag at least myself,
> > > Christine
> > > > Poerschke and Diego Ceccarelli, so we can discuss and potentially
> open
> > a
> > > > pull request.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > --
> > > > Alessandro Benedetti
> > > > Apache Lucene/Solr Committer
> > > > Director, R&D Software Engineer, Search Consultant
> > > >
> > > > www.sease.io
> > > >
> > > >
> > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao  wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of adding
> > > > > NODE_SPLIT_SLACK
> > > > > to the threshold?
> > > > >
> > > > > Reference: org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > >
> > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > >
> > > > >
> > > > > public void setThreshold(float threshold) { this.threshold =
> > threshold
> > > +
> > > > > NODE_SPLIT_SLACK; }
> > > > >
> > > > > We have a feature which can return 0.0 or 1.0
> > > > >
> > > > > And model with this tree:
> > > > >
> > > > >
> > >
> is_xyz_feature,threshold=0.9994,left=0.0010180053,right=-0.0057609854
> > > > >
> > > > > However when Solr actually scores it it is taking it as follows
> > > > > is_xyz_feature:1.0<= 1.01, Go Left
> > > > >
> > > > > So all the time it goes to left which is incorrect.
> > > > >
> > > > > Thanks,
> > > > > Roopa
> > > > >
> > > >
> > >
> >
>


Re: Changing the solr url

2021-07-26 Thread Endika Posadas
Hi,

Sorry for the misunderstanding. The URLs didn't change, I changed the 
deployment to different hosts. But now in solr I have no way of telling that 
what was hostA now it's called hostB.

thanks

On 2021/07/26 12:54:43, Shawn Heisey  wrote: 
> On 7/26/2021 5:19 AM, Endika Posadas wrote:
> > I have a Solr cloud deployment with multiple shards and replicas. After a 
> > while, the url where the solr instances are deployed has changed, so every 
> > solr node is down
> >
> > I would like to update the url of the solr nodes to the updated one so I 
> > can bring the cluster back up. After reading the documentation, I coudn't 
> > find an API to do so, is there a way to update the cluster URLs without 
> > having to manually update Zookeeper?
> 
> This sounds weird.  How have the URLs changed?  I'm not aware of any 
> situation where the URL can change unless someone does something that 
> changes the deployment, or something is done outside of Solr.
> 
> I have NEVER heard of SolrCloud URLs changing without outside 
> influence.  There is the work on time based collections, but even that 
> should give you a stable URL to access the data.
> 
> Thanks,
> Shawn
> 
> 


Re: MultipleAdditiveTreeModel

2021-07-26 Thread Roopa Rao
Hi Alessandro,
I haven't created JIRA for this, we solved this the similar way that Spyros
described, by changing the threshold in the model.
Ya it would be good to understand why there is the SLACK added.

Thanks,
Roopa

On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti 
wrote:

> I didn't get any additional notification (or maybe I missed it).
> Has the Jira been created yet?
> Boolean features are quite common around Learning To Rank use cases.
> I do believe this contribution can be useful.
> If you don't have time to create the Jira or contribute the pull request,
> no worries, just let us know and we (committers) will organize to do it.
> Thanks for your help. without the effort of our users, Apache Solr wouldn't
> be the same.
> Cheers
> --
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Fri, 16 Jul 2021 at 20:29, Roopa Rao  wrote:
>
> > Spyros, thank you for verifying this, we are planning to do something
> > similar.
> >
> > Thanks,
> > Roopa
> >
> > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis 
> > wrote:
> >
> > > Hello,
> > >
> > > Just to verify this, we had come across the exact same issue when
> > > converting an XGBoost model to MUltipleAdditiveTrees. This was an issue
> > > specifically with the categorical features that take on integer values.
> > We
> > > ended up subtracting 0.5 from the threshold value on any such split
> point
> > > on the converted model, so that it would output the same score as the
> > input
> > > model.
> > >
> > > On Fri, Jul 16, 2021, 18:19 Roopa Rao  wrote:
> > >
> > > > Okay, thank you for the input
> > > >
> > > > Roopa
> > > >
> > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > > a.benede...@sease.io
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Roopa,
> > > > > I was not able to find why that slack was added.
> > > > > I am not sure why we would like to change the threshold.
> > > > > I would recommend creating a Jira issue and tag at least myself,
> > > > Christine
> > > > > Poerschke and Diego Ceccarelli, so we can discuss and potentially
> > open
> > > a
> > > > > pull request.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > --
> > > > > Alessandro Benedetti
> > > > > Apache Lucene/Solr Committer
> > > > > Director, R&D Software Engineer, Search Consultant
> > > > >
> > > > > www.sease.io
> > > > >
> > > > >
> > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao  wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of
> adding
> > > > > > NODE_SPLIT_SLACK
> > > > > > to the threshold?
> > > > > >
> > > > > > Reference: org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > > >
> > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > > >
> > > > > >
> > > > > > public void setThreshold(float threshold) { this.threshold =
> > > threshold
> > > > +
> > > > > > NODE_SPLIT_SLACK; }
> > > > > >
> > > > > > We have a feature which can return 0.0 or 1.0
> > > > > >
> > > > > > And model with this tree:
> > > > > >
> > > > > >
> > > >
> > is_xyz_feature,threshold=0.9994,left=0.0010180053,right=-0.0057609854
> > > > > >
> > > > > > However when Solr actually scores it it is taking it as follows
> > > > > > is_xyz_feature:1.0<= 1.01, Go Left
> > > > > >
> > > > > > So all the time it goes to left which is incorrect.
> > > > > >
> > > > > > Thanks,
> > > > > > Roopa
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: CDCR replacement -- Timelines

2021-07-26 Thread Natarajan, Rajeswari

Hi Anushum,

Thanks for your reply. I looked at the PR you shared.  It involves 
processing the MirroredSolrRequest .How is the solr request coming to primary 
solrcloud being intercepted and sent to Kafka , interested in this part.
We can send the same solr request from application to  kafka and mirror it 
in other DC. But from primary DC solrcloud if it gets sent to Kafka  , any  
publish pipeline errors can be avoided.

Regards,
Rajeswari


On 7/16/21, 11:22 AM, "Anshum Gupta"  wrote:

Hi Rajeswari,

There's some effort around CDCR, and that would finally all be in the
solr-sandbox (https://github.com/apache/solr-sandbox) repository . The
intention is for the replacement CDCR mechanism to have its own release
cadence and independent binary release. The dependency on Solr for the
proposal I have isn't too much outside of SolrJ, those releases will 
most
likely be not 100% related and planned together.

My current draft is in my fork and currently evolving. It's based on
something that we've been using at my workplace for a while now. The
evolving draft can be found here:
https://github.com/apache/solr-sandbox/pull/4/files

Hope this helps.

On Thu, Jul 15, 2021 at 9:02 AM Natarajan, Rajeswari
 wrote:

> Hi ,
> Saw in Solr Documentations that Solr community is working to identify 
the
> best recommended replacement in time for 9.0.
> What are the timelines for 9.0 release . We are currently using CDCR 
and
> have now started looking at the alternatives.
>
>
> Thanks,
> Rajeswari
>
>

-- 
Anshum Gupta




Re: MultipleAdditiveTreeModel

2021-07-26 Thread Spyros Kapnissis
Hi Alessandro, Roopa, I also agree that this issue should be further
investigated and fixed. Please let me know if you need any help opening the
Jira ticket and provide more details.

On Mon, Jul 26, 2021, 21:04 Roopa Rao  wrote:

> Hi Alessandro,
> I haven't created JIRA for this, we solved this the similar way that Spyros
> described, by changing the threshold in the model.
> Ya it would be good to understand why there is the SLACK added.
>
> Thanks,
> Roopa
>
> On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti <
> a.benede...@sease.io>
> wrote:
>
> > I didn't get any additional notification (or maybe I missed it).
> > Has the Jira been created yet?
> > Boolean features are quite common around Learning To Rank use cases.
> > I do believe this contribution can be useful.
> > If you don't have time to create the Jira or contribute the pull request,
> > no worries, just let us know and we (committers) will organize to do it.
> > Thanks for your help. without the effort of our users, Apache Solr
> wouldn't
> > be the same.
> > Cheers
> > --
> > Alessandro Benedetti
> > Apache Lucene/Solr Committer
> > Director, R&D Software Engineer, Search Consultant
> >
> > www.sease.io
> >
> >
> > On Fri, 16 Jul 2021 at 20:29, Roopa Rao  wrote:
> >
> > > Spyros, thank you for verifying this, we are planning to do something
> > > similar.
> > >
> > > Thanks,
> > > Roopa
> > >
> > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Just to verify this, we had come across the exact same issue when
> > > > converting an XGBoost model to MUltipleAdditiveTrees. This was an
> issue
> > > > specifically with the categorical features that take on integer
> values.
> > > We
> > > > ended up subtracting 0.5 from the threshold value on any such split
> > point
> > > > on the converted model, so that it would output the same score as the
> > > input
> > > > model.
> > > >
> > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao  wrote:
> > > >
> > > > > Okay, thank you for the input
> > > > >
> > > > > Roopa
> > > > >
> > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > > > a.benede...@sease.io
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Roopa,
> > > > > > I was not able to find why that slack was added.
> > > > > > I am not sure why we would like to change the threshold.
> > > > > > I would recommend creating a Jira issue and tag at least myself,
> > > > > Christine
> > > > > > Poerschke and Diego Ceccarelli, so we can discuss and potentially
> > > open
> > > > a
> > > > > > pull request.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Alessandro Benedetti
> > > > > > Apache Lucene/Solr Committer
> > > > > > Director, R&D Software Engineer, Search Consultant
> > > > > >
> > > > > > www.sease.io
> > > > > >
> > > > > >
> > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao 
> wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of
> > adding
> > > > > > > NODE_SPLIT_SLACK
> > > > > > > to the threshold?
> > > > > > >
> > > > > > > Reference: org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > > > >
> > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > > > >
> > > > > > >
> > > > > > > public void setThreshold(float threshold) { this.threshold =
> > > > threshold
> > > > > +
> > > > > > > NODE_SPLIT_SLACK; }
> > > > > > >
> > > > > > > We have a feature which can return 0.0 or 1.0
> > > > > > >
> > > > > > > And model with this tree:
> > > > > > >
> > > > > > >
> > > > >
> > >
> is_xyz_feature,threshold=0.9994,left=0.0010180053,right=-0.0057609854
> > > > > > >
> > > > > > > However when Solr actually scores it it is taking it as follows
> > > > > > > is_xyz_feature:1.0<= 1.01, Go Left
> > > > > > >
> > > > > > > So all the time it goes to left which is incorrect.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Roopa
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: CDCR replacement -- Timelines

2021-07-26 Thread Anshum Gupta
Hi Rajeswari,

Not sure what's up with my mailbox, but this thread ends up in spam more
often than not.

You can certainly send the updates to a source topic, which can then be
mirrored and replicated but would require external versioning. There are a
few other challenges in that approach, but if you'd be interested in
participating and discussing this more please feel free to join the ASF
slack or comment on the PR/JIRA.




On Mon, Jul 26, 2021 at 12:25 PM Natarajan, Rajeswari
 wrote:

>
> Hi Anushum,
>
> Thanks for your reply. I looked at the PR you shared.  It involves
> processing the MirroredSolrRequest .How is the solr request coming to
> primary solrcloud being intercepted and sent to Kafka , interested in this
> part.
> We can send the same solr request from application to  kafka and
> mirror it in other DC. But from primary DC solrcloud if it gets sent to
> Kafka  , any  publish pipeline errors can be avoided.
>
> Regards,
> Rajeswari
>
>
> On 7/16/21, 11:22 AM, "Anshum Gupta"  wrote:
>
> Hi Rajeswari,
>
> There's some effort around CDCR, and that would finally all be in
> the
> solr-sandbox (https://github.com/apache/solr-sandbox) repository
> . The
> intention is for the replacement CDCR mechanism to have its own
> release
> cadence and independent binary release. The dependency on Solr for
> the
> proposal I have isn't too much outside of SolrJ, those releases
> will most
> likely be not 100% related and planned together.
>
> My current draft is in my fork and currently evolving. It's based
> on
> something that we've been using at my workplace for a while now.
> The
> evolving draft can be found here:
> https://github.com/apache/solr-sandbox/pull/4/files
>
> Hope this helps.
>
> On Thu, Jul 15, 2021 at 9:02 AM Natarajan, Rajeswari
>  wrote:
>
> > Hi ,
> > Saw in Solr Documentations that Solr community is working to
> identify the
> > best recommended replacement in time for 9.0.
> > What are the timelines for 9.0 release . We are currently using
> CDCR and
> > have now started looking at the alternatives.
> >
> >
> > Thanks,
> > Rajeswari
> >
> >
>
> --
> Anshum Gupta
>
>
>

-- 
Anshum Gupta


RE: Print Solr Responses || SOLR 7.5

2021-07-26 Thread Akreeti Agarwal
Classification: Confidential

Hi,

By table you mean? Where should I store the entire result?
Also, is there any way to track or print responses in log or generate request 
ID's

Regards,
Akreeti Agarwal

-Original Message-
From: Dave 
Sent: Friday, July 23, 2021 8:09 PM
To: users@solr.apache.org
Cc: solr-u...@lucene.apache.org
Subject: Re: Print Solr Responses || SOLR 7.5

[CAUTION: This Email is from outside the Organization. Unless you trust the 
sender, Don’t click links or open attachments as it may be a Phishing email, 
which can steal your Information and compromise your Computer.]

Assuming you have an interface to solr in between your ap and solr server, why 
not just store the entire result set in json format into a table? It’s fast 
reliable and does exactly what you want yes?

> On Jul 23, 2021, at 10:34 AM, Gora Mohanty  wrote:
>
> On Fri, 23 Jul 2021 at 11:05, Akreeti Agarwal
> 
> wrote:
>
>> Classification: Confidential
>> Hi All,
>>
>> I am using SOLR 7.5 Master/Slave architecture in my project. Just
>> wanted to know, is there any way in which we can print responses
>> generated by SOLR query in some log file i.e. as many solr query are
>> getting hit, we can store the response in some file.
>> If it can be done, then can be done on timely basis like we set it
>> for say
>> 5 minutes.
>>
>
> Depending on your configuration, these requests should be logged to
> the Solr logs. The location of the logs depends on the logging
> configuration, and how you are running Solr. For the built-in Jetty,
> there should be a logs/ sub-directory
>
>
>>
>> Please help as I am not able to resolve one issue coming on prod
>> environment.
>> Issue:
>> SOLR is returning response with "*" which is causing parsing issue at
>> API side
>>
>> {* "response":{"numFound":1,"start":0,"docs":[.
>>
>
> Your question is not clear: "Parsing issue at API side" for what API?
> If it is an issue with your code trying to parse the Solr response,
> surely that's up to you to fix.
>
> Are you sure that you are asking Solr for a desirable return format.
> e.g., JSON. Please see
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsolr
> .apache.org%2Fguide%2F7_5%2Fresponse-writers.html&data=04%7C01%7CA
> kreetiA%40hcl.com%7C40499e0454cf4fa374c808d94de7c8eb%7C189de737c93a4f5
> a8b686f4ca9941912%7C0%7C0%7C637626480163920349%7CUnknown%7CTWFpbGZsb3d
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 1000&sdata=mPdFKZysr4g6%2B8fZWU%2FeiCmSVnC8smNV%2FQ9w8CjBs6s%3D&am
> p;reserved=0 for the various return formats.You can specify the return
> format with the "wt" parameter in the query to Solr. It is also
> probably advisable to use a client API:
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsolr
> .apache.org%2Fguide%2F7_5%2Fclient-apis.html&data=04%7C01%7CAkreet
> iA%40hcl.com%7C40499e0454cf4fa374c808d94de7c8eb%7C189de737c93a4f5a8b68
> 6f4ca9941912%7C0%7C0%7C637626480163920349%7CUnknown%7CTWFpbGZsb3d8eyJW
> IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&
> amp;sdata=rwFkwqxyACgIQdTVviR9rQ6LU1DzY6DKPs6%2FVN1YoCA%3D&reserve
> d=0
>
> Regards,
> Gora
::DISCLAIMER::

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. E-mail transmission is not guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or may contain viruses in transmission. 
The e mail and its contents (with or without referred errors) shall therefore 
not attach any liability on the originator or HCL or its affiliates. Views or 
opinions, if any, presented in this email are solely those of the author and 
may not necessarily reflect the views or opinions of HCL or its affiliates. Any 
form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of authorized representative of HCL is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any email and/or attachments, please check them for 
viruses and other defects.