Hi Alessandro, I haven't created JIRA for this, we solved this the similar way that Spyros described, by changing the threshold in the model. Ya it would be good to understand why there is the SLACK added.
Thanks, Roopa On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti <a.benede...@sease.io> wrote: > I didn't get any additional notification (or maybe I missed it). > Has the Jira been created yet? > Boolean features are quite common around Learning To Rank use cases. > I do believe this contribution can be useful. > If you don't have time to create the Jira or contribute the pull request, > no worries, just let us know and we (committers) will organize to do it. > Thanks for your help. without the effort of our users, Apache Solr wouldn't > be the same. > Cheers > -------------------------- > Alessandro Benedetti > Apache Lucene/Solr Committer > Director, R&D Software Engineer, Search Consultant > > www.sease.io > > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao <roop...@gmail.com> wrote: > > > Spyros, thank you for verifying this, we are planning to do something > > similar. > > > > Thanks, > > Roopa > > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis <ska...@gmail.com> > > wrote: > > > > > Hello, > > > > > > Just to verify this, we had come across the exact same issue when > > > converting an XGBoost model to MUltipleAdditiveTrees. This was an issue > > > specifically with the categorical features that take on integer values. > > We > > > ended up subtracting 0.5 from the threshold value on any such split > point > > > on the converted model, so that it would output the same score as the > > input > > > model. > > > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao <roop...@gmail.com> wrote: > > > > > > > Okay, thank you for the input > > > > > > > > Roopa > > > > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti < > > > a.benede...@sease.io > > > > > > > > > wrote: > > > > > > > > > Hi Roopa, > > > > > I was not able to find why that slack was added. > > > > > I am not sure why we would like to change the threshold. > > > > > I would recommend creating a Jira issue and tag at least myself, > > > > Christine > > > > > Poerschke and Diego Ceccarelli, so we can discuss and potentially > > open > > > a > > > > > pull request. > > > > > > > > > > Cheers > > > > > > > > > > > > > > > -------------------------- > > > > > Alessandro Benedetti > > > > > Apache Lucene/Solr Committer > > > > > Director, R&D Software Engineer, Search Consultant > > > > > > > > > > www.sease.io > > > > > > > > > > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao <roop...@gmail.com> wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of > adding > > > > > > NODE_SPLIT_SLACK > > > > > > to the threshold? > > > > > > > > > > > > Reference: org.apache.solr.ltr.model.MultipleAdditiveTreesModel > > > > > > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f; > > > > > > > > > > > > > > > > > > public void setThreshold(float threshold) { this.threshold = > > > threshold > > > > + > > > > > > NODE_SPLIT_SLACK; } > > > > > > > > > > > > We have a feature which can return 0.0 or 1.0 > > > > > > > > > > > > And model with this tree: > > > > > > > > > > > > > > > > > > is_xyz_feature,threshold=0.99999994,left=0.0010180053,right=-0.0057609854 > > > > > > > > > > > > However when Solr actually scores it it is taking it as follows > > > > > > is_xyz_feature:1.0<= 1.000001, Go Left > > > > > > > > > > > > So all the time it goes to left which is incorrect. > > > > > > > > > > > > Thanks, > > > > > > Roopa > > > > > > > > > > > > > > > > > > > > >