you are very welcome, feel free to reply in this thread if you have any updates, or open a Jira on Apache Solr (tagging me) or direct message.
Being Apache Solr an open-source project, any help coming from the community is welcome, and as a committer, I would be delighted to facilitate these contributions. Cheers -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Fri, 30 Jul 2021 at 08:33, Kerwin <kerwin...@gmail.com> wrote: > Hi Alessandro, > > Thank you for spending some time to look into my query. I am still trying > to understand the use of the function under computeRelatedness using the > number 30 and also some other numbers. The use of the foreground count will > help as an additional parameter if it were present. It will take me some > time to work on your idea. Hence for now will continue with what I have. > Thanks again for your inputs. > > On Mon, Jul 26, 2021 at 8:18 PM Alessandro Benedetti <a.benede...@sease.io > > > wrote: > > > Hi Kerwin, > > I was taking a look to your question and the > > *org.apache.solr.search.facet.RelatednessAgg* code, in line : > > -------------------------- > > Alessandro Benedetti > > Apache Lucene/Solr Committer > > Director, R&D Software Engineer, Search Consultant > > > > www.sease.io > > > > > > On Thu, 22 Jul 2021 at 08:27, Kerwin <kerwin...@gmail.com> wrote: > > > > > Hi Solr users, > > > > > > I have a question on the relatedness and Semantic Knowledge Graphs > > feature > > > in Solr. > > > While the results are good with the out of box provision, I need some > > > tweaking on the ability to specify filters or parameters based on only > > the > > > foreground count. Right now only the min_popularity parameter is > > available > > > which applies to both the foreground dataset or the background one. > > > > so far so good > > > > > The > > > white paper from Trey Grainger and his team mention that the z score is > > > used to calculate the score. As per my understanding, the z score > > assumes a > > > normal distribution and is applicable when sample size>30 which I > assume > > is > > > the foreground count. > > > > I don't have time right now to go through the paper, but the only place I > > found the '30' magic number in the class is within this > > method: org.apache.solr.search.facet.RelatednessAgg#computeRelatedness > > It's not even defined as a constant nor a variable driven by a param so > > it's not possible to change it unless we improve the code. > > > > > So I would like to control this value with a > > > parameter or filter. Right now I am getting the approximate count by > > doing > > > a reverse calculation on the foreground popularity and the background > > size > > > to get the foreground count. Kindly correct me if my understanding is > > > different from what it should be. > > > > > What I recommend is to take a look at the code references I put, and > write > > a contribution on your own to add the additional configuration with the > > explanation. > > As a committer, I would be happy to review such work and merge it in if > it > > improves the relatedness aggregation (we could take the occasion to also > > rename some of the variables, which seem to not align with java standard > > 'min_pop' => minPopularity, ect ect > > Cheers > > >