Lucene, Spark, HDFS question

2018-03-13 Thread Tom Hirschfeld
Hello! *Background*: My team is running a machine learning pipeline, and part of the pipeline is an http scrape of a web based Lucene application via http calls. The scrape outputs a CSV file that we then upload to HDFS and use it as input to run a spark ML job. *Question: *Is there a way for ou

Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Erdan Genc
@Erik: I didn't know that, how can I figure out which query types support payload scoring? The class I described is wrapped into an elasticsearch plugin so I don't have full control over this. Currently I'm using the SpanTermQuery, maybe another available query type will do, so I don't need to impl

Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Michael Sokolov
Also, if you are no longer using the term frequency at all, you might consider wiring your score (the one you are currently wiring into payloads) in there, in place of the term frequency. On Mar 13, 2018 6:57 AM, "Erik Hatcher" wrote: > Payloads are only scored from certain query types. What q

Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Erik Hatcher
Payloads are only scored from certain query types. What query are you executing? > On Mar 13, 2018, at 04:58, Grdan Eenc wrote: > > Hej there, > > I want to extend the TFIDF Similarity class such that the term frequency is > neglected and the value in the payload used instead. Therefore I ba

Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Grdan Eenc
Hej there, I want to extend the TFIDF Similarity class such that the term frequency is neglected and the value in the payload used instead. Therefore I basically do this: @Override public float tf(float freq) { return 1f; } public float scorePayload(int doc, int start, in