Re: MultipleAdditiveTreeModel

Spyros Kapnissis Wed, 28 Jul 2021 12:00:49 -0700

Hi Alessandro, Roopa, I created the ticket here:
https://issues.apache.org/jira/browse/SOLR-15569 . I don't think I have
permission to add people though, so please tag whomever you feel is
necessary.
Pls let me know if you need any more info, thanks!


On Tue, Jul 27, 2021 at 1:00 PM Alessandro Benedetti <a.benede...@sease.io>
wrote:

> Hi Spyros, Roopa,
> if you can create the Jira ticket with all the details you gathered, that
> would be much appreciated.
> If you tag me, Christine Poerschke, and Diego Ceccarelli at least, we'll
> take over from there!
> Thanks!
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Mon, 26 Jul 2021 at 21:29, Spyros Kapnissis <ska...@gmail.com> wrote:
>
> > Hi Alessandro, Roopa, I also agree that this issue should be further
> > investigated and fixed. Please let me know if you need any help opening
> the
> > Jira ticket and provide more details.
> >
> > On Mon, Jul 26, 2021, 21:04 Roopa Rao <roop...@gmail.com> wrote:
> >
> > > Hi Alessandro,
> > > I haven't created JIRA for this, we solved this the similar way that
> > Spyros
> > > described, by changing the threshold in the model.
> > > Ya it would be good to understand why there is the SLACK added.
> > >
> > > Thanks,
> > > Roopa
> > >
> > > On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti <
> > > a.benede...@sease.io>
> > > wrote:
> > >
> > > > I didn't get any additional notification (or maybe I missed it).
> > > > Has the Jira been created yet?
> > > > Boolean features are quite common around Learning To Rank use cases.
> > > > I do believe this contribution can be useful.
> > > > If you don't have time to create the Jira or contribute the pull
> > request,
> > > > no worries, just let us know and we (committers) will organize to do
> > it.
> > > > Thanks for your help. without the effort of our users, Apache Solr
> > > wouldn't
> > > > be the same.
> > > > Cheers
> > > > --------------------------
> > > > Alessandro Benedetti
> > > > Apache Lucene/Solr Committer
> > > > Director, R&D Software Engineer, Search Consultant
> > > >
> > > > www.sease.io
> > > >
> > > >
> > > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao <roop...@gmail.com> wrote:
> > > >
> > > > > Spyros, thank you for verifying this, we are planning to do
> something
> > > > > similar.
> > > > >
> > > > > Thanks,
> > > > > Roopa
> > > > >
> > > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis <
> ska...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Just to verify this, we had come across the exact same issue when
> > > > > > converting an XGBoost model to MUltipleAdditiveTrees. This was an
> > > issue
> > > > > > specifically with the categorical features that take on integer
> > > values.
> > > > > We
> > > > > > ended up subtracting 0.5 from the threshold value on any such
> split
> > > > point
> > > > > > on the converted model, so that it would output the same score as
> > the
> > > > > input
> > > > > > model.
> > > > > >
> > > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao <roop...@gmail.com> wrote:
> > > > > >
> > > > > > > Okay, thank you for the input
> > > > > > >
> > > > > > > Roopa
> > > > > > >
> > > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > > > > > a.benede...@sease.io
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Roopa,
> > > > > > > > I was not able to find why that slack was added.
> > > > > > > > I am not sure why we would like to change the threshold.
> > > > > > > > I would recommend creating a Jira issue and tag at least
> > myself,
> > > > > > > Christine
> > > > > > > > Poerschke and Diego Ceccarelli, so we can discuss and
> > potentially
> > > > > open
> > > > > > a
> > > > > > > > pull request.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > >
> > > > > > > > --------------------------
> > > > > > > > Alessandro Benedetti
> > > > > > > > Apache Lucene/Solr Committer
> > > > > > > > Director, R&D Software Engineer, Search Consultant
> > > > > > > >
> > > > > > > > www.sease.io
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao <roop...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of
> > > > adding
> > > > > > > > > NODE_SPLIT_SLACK
> > > > > > > > > to the threshold?
> > > > > > > > >
> > > > > > > > > Reference:
> > org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > > > > > >
> > > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > public void setThreshold(float threshold) { this.threshold
> =
> > > > > > threshold
> > > > > > > +
> > > > > > > > > NODE_SPLIT_SLACK; }
> > > > > > > > >
> > > > > > > > > We have a feature which can return 0.0 or 1.0
> > > > > > > > >
> > > > > > > > > And model with this tree:
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> is_xyz_feature,threshold=0.99999994,left=0.0010180053,right=-0.0057609854
> > > > > > > > >
> > > > > > > > > However when Solr actually scores it it is taking it as
> > follows
> > > > > > > > > is_xyz_feature:1.0<= 1.000001, Go Left
> > > > > > > > >
> > > > > > > > > So all the time it goes to left which is incorrect.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Roopa
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: MultipleAdditiveTreeModel

Reply via email to