Hi Kerwin, I was taking a look to your question and the *org.apache.solr.search.facet.RelatednessAgg* code, in line : -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant
www.sease.io On Thu, 22 Jul 2021 at 08:27, Kerwin <kerwin...@gmail.com> wrote: > Hi Solr users, > > I have a question on the relatedness and Semantic Knowledge Graphs feature > in Solr. > While the results are good with the out of box provision, I need some > tweaking on the ability to specify filters or parameters based on only the > foreground count. Right now only the min_popularity parameter is available > which applies to both the foreground dataset or the background one. so far so good > The > white paper from Trey Grainger and his team mention that the z score is > used to calculate the score. As per my understanding, the z score assumes a > normal distribution and is applicable when sample size>30 which I assume is > the foreground count. I don't have time right now to go through the paper, but the only place I found the '30' magic number in the class is within this method: org.apache.solr.search.facet.RelatednessAgg#computeRelatedness It's not even defined as a constant nor a variable driven by a param so it's not possible to change it unless we improve the code. > So I would like to control this value with a > parameter or filter. Right now I am getting the approximate count by doing > a reverse calculation on the foreground popularity and the background size > to get the foreground count. Kindly correct me if my understanding is > different from what it should be. > What I recommend is to take a look at the code references I put, and write a contribution on your own to add the additional configuration with the explanation. As a committer, I would be happy to review such work and merge it in if it improves the relatedness aggregation (we could take the occasion to also rename some of the variables, which seem to not align with java standard 'min_pop' => minPopularity, ect ect Cheers