Duplicate filtering

2016-09-19 Thread Vjeran Marcinko
Hello, I'm pretty much Lucene newb, so wondering for some short guidelines on how to implement some duplicate document filtering based on some field which defines uniqueness, and first document stays, other duplicates are filtered out? I know some 3rd party contrib lib existed before which w

FacetResult getTopChildren

2016-09-19 Thread Cam Bazz
Hello, FacetResult getTopChildren returns the top N facets, however I need to return facets where count is above a certain threshold, for example return all facets that had counts > 10. Is there a way to accomplish this? I have been looking over the API docs and could not find it. I could maybe g

RE: Cooccurrence matrices

2016-09-19 Thread Allison, Timothy B.
Take a look at LUCENE-5317 [1] and LUCENE-5318 [2]. They're available on my github site [3], and I've pushed them to maven central [4]. LUCENE-5318 is crazily useful as a term/phrase recommender system. I haven't documented either very well yet. I'll try to add documentation to my github site

Cooccurrence matrices

2016-09-19 Thread José Tomás Atria
Hello All, I'm trying to use Lucene in order to create a sliding window cooccurrence matrix. I've found some old discussion threads on this list that provide some pointers, but most of those are for really old lucene versions, or rely on components that are no longer available. So far, I tried wa

null Query from MultiFieldQueryParser.getFieldQuery

2016-09-19 Thread Oliver Kaleske
Hi, in updating Lucene from 6.1.0 to 6.2.0 I came across the following: We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom type of Query, which calls getFieldQuery() on its base class (MFQP). For each of its search fields, this method has a Query created by calling getFiel