thank you, Spyros!
I take it from there.

Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Wed, 28 Jul 2021 at 21:00, Spyros Kapnissis <ska...@gmail.com> wrote:

> Hi Alessandro, Roopa, I created the ticket here:
> https://issues.apache.org/jira/browse/SOLR-15569 . I don't think I have
> permission to add people though, so please tag whomever you feel is
> necessary.
> Pls let me know if you need any more info, thanks!
>
> On Tue, Jul 27, 2021 at 1:00 PM Alessandro Benedetti <a.benede...@sease.io
> >
> wrote:
>
> > Hi Spyros, Roopa,
> > if you can create the Jira ticket with all the details you gathered, that
> > would be much appreciated.
> > If you tag me, Christine Poerschke, and Diego Ceccarelli at least, we'll
> > take over from there!
> > Thanks!
> > --------------------------
> > Alessandro Benedetti
> > Apache Lucene/Solr Committer
> > Director, R&D Software Engineer, Search Consultant
> >
> > www.sease.io
> >
> >
> > On Mon, 26 Jul 2021 at 21:29, Spyros Kapnissis <ska...@gmail.com> wrote:
> >
> > > Hi Alessandro, Roopa, I also agree that this issue should be further
> > > investigated and fixed. Please let me know if you need any help opening
> > the
> > > Jira ticket and provide more details.
> > >
> > > On Mon, Jul 26, 2021, 21:04 Roopa Rao <roop...@gmail.com> wrote:
> > >
> > > > Hi Alessandro,
> > > > I haven't created JIRA for this, we solved this the similar way that
> > > Spyros
> > > > described, by changing the threshold in the model.
> > > > Ya it would be good to understand why there is the SLACK added.
> > > >
> > > > Thanks,
> > > > Roopa
> > > >
> > > > On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti <
> > > > a.benede...@sease.io>
> > > > wrote:
> > > >
> > > > > I didn't get any additional notification (or maybe I missed it).
> > > > > Has the Jira been created yet?
> > > > > Boolean features are quite common around Learning To Rank use
> cases.
> > > > > I do believe this contribution can be useful.
> > > > > If you don't have time to create the Jira or contribute the pull
> > > request,
> > > > > no worries, just let us know and we (committers) will organize to
> do
> > > it.
> > > > > Thanks for your help. without the effort of our users, Apache Solr
> > > > wouldn't
> > > > > be the same.
> > > > > Cheers
> > > > > --------------------------
> > > > > Alessandro Benedetti
> > > > > Apache Lucene/Solr Committer
> > > > > Director, R&D Software Engineer, Search Consultant
> > > > >
> > > > > www.sease.io
> > > > >
> > > > >
> > > > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao <roop...@gmail.com> wrote:
> > > > >
> > > > > > Spyros, thank you for verifying this, we are planning to do
> > something
> > > > > > similar.
> > > > > >
> > > > > > Thanks,
> > > > > > Roopa
> > > > > >
> > > > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis <
> > ska...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Just to verify this, we had come across the exact same issue
> when
> > > > > > > converting an XGBoost model to MUltipleAdditiveTrees. This was
> an
> > > > issue
> > > > > > > specifically with the categorical features that take on integer
> > > > values.
> > > > > > We
> > > > > > > ended up subtracting 0.5 from the threshold value on any such
> > split
> > > > > point
> > > > > > > on the converted model, so that it would output the same score
> as
> > > the
> > > > > > input
> > > > > > > model.
> > > > > > >
> > > > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao <roop...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > > Okay, thank you for the input
> > > > > > > >
> > > > > > > > Roopa
> > > > > > > >
> > > > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > > > > > > a.benede...@sease.io
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Roopa,
> > > > > > > > > I was not able to find why that slack was added.
> > > > > > > > > I am not sure why we would like to change the threshold.
> > > > > > > > > I would recommend creating a Jira issue and tag at least
> > > myself,
> > > > > > > > Christine
> > > > > > > > > Poerschke and Diego Ceccarelli, so we can discuss and
> > > potentially
> > > > > > open
> > > > > > > a
> > > > > > > > > pull request.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --------------------------
> > > > > > > > > Alessandro Benedetti
> > > > > > > > > Apache Lucene/Solr Committer
> > > > > > > > > Director, R&D Software Engineer, Search Consultant
> > > > > > > > >
> > > > > > > > > www.sease.io
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao <roop...@gmail.com
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose
> of
> > > > > adding
> > > > > > > > > > NODE_SPLIT_SLACK
> > > > > > > > > > to the threshold?
> > > > > > > > > >
> > > > > > > > > > Reference:
> > > org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > > > > > > >
> > > > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > public void setThreshold(float threshold) {
> this.threshold
> > =
> > > > > > > threshold
> > > > > > > > +
> > > > > > > > > > NODE_SPLIT_SLACK; }
> > > > > > > > > >
> > > > > > > > > > We have a feature which can return 0.0 or 1.0
> > > > > > > > > >
> > > > > > > > > > And model with this tree:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > is_xyz_feature,threshold=0.99999994,left=0.0010180053,right=-0.0057609854
> > > > > > > > > >
> > > > > > > > > > However when Solr actually scores it it is taking it as
> > > follows
> > > > > > > > > > is_xyz_feature:1.0<= 1.000001, Go Left
> > > > > > > > > >
> > > > > > > > > > So all the time it goes to left which is incorrect.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Roopa
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to