Thank you Alessandro! I also created a PR with a test case for the issue and a potential fix, thanks!
On Tue, Aug 3, 2021 at 1:27 PM Alessandro Benedetti <a.benede...@sease.io> wrote: > thank you, Spyros! > I take it from there. > > Cheers > -------------------------- > Alessandro Benedetti > Apache Lucene/Solr Committer > Director, R&D Software Engineer, Search Consultant > > www.sease.io > > > On Wed, 28 Jul 2021 at 21:00, Spyros Kapnissis <ska...@gmail.com> wrote: > > > Hi Alessandro, Roopa, I created the ticket here: > > https://issues.apache.org/jira/browse/SOLR-15569 . I don't think I have > > permission to add people though, so please tag whomever you feel is > > necessary. > > Pls let me know if you need any more info, thanks! > > > > On Tue, Jul 27, 2021 at 1:00 PM Alessandro Benedetti < > a.benede...@sease.io > > > > > wrote: > > > > > Hi Spyros, Roopa, > > > if you can create the Jira ticket with all the details you gathered, > that > > > would be much appreciated. > > > If you tag me, Christine Poerschke, and Diego Ceccarelli at least, > we'll > > > take over from there! > > > Thanks! > > > -------------------------- > > > Alessandro Benedetti > > > Apache Lucene/Solr Committer > > > Director, R&D Software Engineer, Search Consultant > > > > > > www.sease.io > > > > > > > > > On Mon, 26 Jul 2021 at 21:29, Spyros Kapnissis <ska...@gmail.com> > wrote: > > > > > > > Hi Alessandro, Roopa, I also agree that this issue should be further > > > > investigated and fixed. Please let me know if you need any help > opening > > > the > > > > Jira ticket and provide more details. > > > > > > > > On Mon, Jul 26, 2021, 21:04 Roopa Rao <roop...@gmail.com> wrote: > > > > > > > > > Hi Alessandro, > > > > > I haven't created JIRA for this, we solved this the similar way > that > > > > Spyros > > > > > described, by changing the threshold in the model. > > > > > Ya it would be good to understand why there is the SLACK added. > > > > > > > > > > Thanks, > > > > > Roopa > > > > > > > > > > On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti < > > > > > a.benede...@sease.io> > > > > > wrote: > > > > > > > > > > > I didn't get any additional notification (or maybe I missed it). > > > > > > Has the Jira been created yet? > > > > > > Boolean features are quite common around Learning To Rank use > > cases. > > > > > > I do believe this contribution can be useful. > > > > > > If you don't have time to create the Jira or contribute the pull > > > > request, > > > > > > no worries, just let us know and we (committers) will organize to > > do > > > > it. > > > > > > Thanks for your help. without the effort of our users, Apache > Solr > > > > > wouldn't > > > > > > be the same. > > > > > > Cheers > > > > > > -------------------------- > > > > > > Alessandro Benedetti > > > > > > Apache Lucene/Solr Committer > > > > > > Director, R&D Software Engineer, Search Consultant > > > > > > > > > > > > www.sease.io > > > > > > > > > > > > > > > > > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao <roop...@gmail.com> > wrote: > > > > > > > > > > > > > Spyros, thank you for verifying this, we are planning to do > > > something > > > > > > > similar. > > > > > > > > > > > > > > Thanks, > > > > > > > Roopa > > > > > > > > > > > > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis < > > > ska...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > Just to verify this, we had come across the exact same issue > > when > > > > > > > > converting an XGBoost model to MUltipleAdditiveTrees. This > was > > an > > > > > issue > > > > > > > > specifically with the categorical features that take on > integer > > > > > values. > > > > > > > We > > > > > > > > ended up subtracting 0.5 from the threshold value on any such > > > split > > > > > > point > > > > > > > > on the converted model, so that it would output the same > score > > as > > > > the > > > > > > > input > > > > > > > > model. > > > > > > > > > > > > > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao <roop...@gmail.com> > > wrote: > > > > > > > > > > > > > > > > > Okay, thank you for the input > > > > > > > > > > > > > > > > > > Roopa > > > > > > > > > > > > > > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti < > > > > > > > > a.benede...@sease.io > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Roopa, > > > > > > > > > > I was not able to find why that slack was added. > > > > > > > > > > I am not sure why we would like to change the threshold. > > > > > > > > > > I would recommend creating a Jira issue and tag at least > > > > myself, > > > > > > > > > Christine > > > > > > > > > > Poerschke and Diego Ceccarelli, so we can discuss and > > > > potentially > > > > > > > open > > > > > > > > a > > > > > > > > > > pull request. > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------- > > > > > > > > > > Alessandro Benedetti > > > > > > > > > > Apache Lucene/Solr Committer > > > > > > > > > > Director, R&D Software Engineer, Search Consultant > > > > > > > > > > > > > > > > > > > > www.sease.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao < > roop...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > In LTR for MultipleAdditiveTreeModel what is the > purpose > > of > > > > > > adding > > > > > > > > > > > NODE_SPLIT_SLACK > > > > > > > > > > > to the threshold? > > > > > > > > > > > > > > > > > > > > > > Reference: > > > > org.apache.solr.ltr.model.MultipleAdditiveTreesModel > > > > > > > > > > > > > > > > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > public void setThreshold(float threshold) { > > this.threshold > > > = > > > > > > > > threshold > > > > > > > > > + > > > > > > > > > > > NODE_SPLIT_SLACK; } > > > > > > > > > > > > > > > > > > > > > > We have a feature which can return 0.0 or 1.0 > > > > > > > > > > > > > > > > > > > > > > And model with this tree: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is_xyz_feature,threshold=0.99999994,left=0.0010180053,right=-0.0057609854 > > > > > > > > > > > > > > > > > > > > > > However when Solr actually scores it it is taking it as > > > > follows > > > > > > > > > > > is_xyz_feature:1.0<= 1.000001, Go Left > > > > > > > > > > > > > > > > > > > > > > So all the time it goes to left which is incorrect. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Roopa > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >