Thank you Alessandro! I also created a PR with a test case for the issue
and a potential fix, thanks!

On Tue, Aug 3, 2021 at 1:27 PM Alessandro Benedetti <a.benede...@sease.io>
wrote:

> thank you, Spyros!
> I take it from there.
>
> Cheers
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Wed, 28 Jul 2021 at 21:00, Spyros Kapnissis <ska...@gmail.com> wrote:
>
> > Hi Alessandro, Roopa, I created the ticket here:
> > https://issues.apache.org/jira/browse/SOLR-15569 . I don't think I have
> > permission to add people though, so please tag whomever you feel is
> > necessary.
> > Pls let me know if you need any more info, thanks!
> >
> > On Tue, Jul 27, 2021 at 1:00 PM Alessandro Benedetti <
> a.benede...@sease.io
> > >
> > wrote:
> >
> > > Hi Spyros, Roopa,
> > > if you can create the Jira ticket with all the details you gathered,
> that
> > > would be much appreciated.
> > > If you tag me, Christine Poerschke, and Diego Ceccarelli at least,
> we'll
> > > take over from there!
> > > Thanks!
> > > --------------------------
> > > Alessandro Benedetti
> > > Apache Lucene/Solr Committer
> > > Director, R&D Software Engineer, Search Consultant
> > >
> > > www.sease.io
> > >
> > >
> > > On Mon, 26 Jul 2021 at 21:29, Spyros Kapnissis <ska...@gmail.com>
> wrote:
> > >
> > > > Hi Alessandro, Roopa, I also agree that this issue should be further
> > > > investigated and fixed. Please let me know if you need any help
> opening
> > > the
> > > > Jira ticket and provide more details.
> > > >
> > > > On Mon, Jul 26, 2021, 21:04 Roopa Rao <roop...@gmail.com> wrote:
> > > >
> > > > > Hi Alessandro,
> > > > > I haven't created JIRA for this, we solved this the similar way
> that
> > > > Spyros
> > > > > described, by changing the threshold in the model.
> > > > > Ya it would be good to understand why there is the SLACK added.
> > > > >
> > > > > Thanks,
> > > > > Roopa
> > > > >
> > > > > On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti <
> > > > > a.benede...@sease.io>
> > > > > wrote:
> > > > >
> > > > > > I didn't get any additional notification (or maybe I missed it).
> > > > > > Has the Jira been created yet?
> > > > > > Boolean features are quite common around Learning To Rank use
> > cases.
> > > > > > I do believe this contribution can be useful.
> > > > > > If you don't have time to create the Jira or contribute the pull
> > > > request,
> > > > > > no worries, just let us know and we (committers) will organize to
> > do
> > > > it.
> > > > > > Thanks for your help. without the effort of our users, Apache
> Solr
> > > > > wouldn't
> > > > > > be the same.
> > > > > > Cheers
> > > > > > --------------------------
> > > > > > Alessandro Benedetti
> > > > > > Apache Lucene/Solr Committer
> > > > > > Director, R&D Software Engineer, Search Consultant
> > > > > >
> > > > > > www.sease.io
> > > > > >
> > > > > >
> > > > > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao <roop...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Spyros, thank you for verifying this, we are planning to do
> > > something
> > > > > > > similar.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Roopa
> > > > > > >
> > > > > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis <
> > > ska...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > Just to verify this, we had come across the exact same issue
> > when
> > > > > > > > converting an XGBoost model to MUltipleAdditiveTrees. This
> was
> > an
> > > > > issue
> > > > > > > > specifically with the categorical features that take on
> integer
> > > > > values.
> > > > > > > We
> > > > > > > > ended up subtracting 0.5 from the threshold value on any such
> > > split
> > > > > > point
> > > > > > > > on the converted model, so that it would output the same
> score
> > as
> > > > the
> > > > > > > input
> > > > > > > > model.
> > > > > > > >
> > > > > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao <roop...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Okay, thank you for the input
> > > > > > > > >
> > > > > > > > > Roopa
> > > > > > > > >
> > > > > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > > > > > > > a.benede...@sease.io
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Roopa,
> > > > > > > > > > I was not able to find why that slack was added.
> > > > > > > > > > I am not sure why we would like to change the threshold.
> > > > > > > > > > I would recommend creating a Jira issue and tag at least
> > > > myself,
> > > > > > > > > Christine
> > > > > > > > > > Poerschke and Diego Ceccarelli, so we can discuss and
> > > > potentially
> > > > > > > open
> > > > > > > > a
> > > > > > > > > > pull request.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --------------------------
> > > > > > > > > > Alessandro Benedetti
> > > > > > > > > > Apache Lucene/Solr Committer
> > > > > > > > > > Director, R&D Software Engineer, Search Consultant
> > > > > > > > > >
> > > > > > > > > > www.sease.io
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao <
> roop...@gmail.com
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > In LTR for MultipleAdditiveTreeModel what is the
> purpose
> > of
> > > > > > adding
> > > > > > > > > > > NODE_SPLIT_SLACK
> > > > > > > > > > > to the threshold?
> > > > > > > > > > >
> > > > > > > > > > > Reference:
> > > > org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > > > > > > > >
> > > > > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > public void setThreshold(float threshold) {
> > this.threshold
> > > =
> > > > > > > > threshold
> > > > > > > > > +
> > > > > > > > > > > NODE_SPLIT_SLACK; }
> > > > > > > > > > >
> > > > > > > > > > > We have a feature which can return 0.0 or 1.0
> > > > > > > > > > >
> > > > > > > > > > > And model with this tree:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> is_xyz_feature,threshold=0.99999994,left=0.0010180053,right=-0.0057609854
> > > > > > > > > > >
> > > > > > > > > > > However when Solr actually scores it it is taking it as
> > > > follows
> > > > > > > > > > > is_xyz_feature:1.0<= 1.000001, Go Left
> > > > > > > > > > >
> > > > > > > > > > > So all the time it goes to left which is incorrect.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Roopa
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to