Re: codec: accessing term dictionary

2017-03-10 Thread Jürgen Jakobitsch
david, thanks for your input.. initially i was hoping to be able to use FST somehow in this process, but my knowledge in this area is fairly manageable.. i will give it a second thought anyway... ;-) krj *Jürgen Jakobitsch* Innovation Director Semantic Web Company GmbH EU: +43-1-4021235-0 Mobile

Re: codec: accessing term dictionary

2017-03-10 Thread Jürgen Jakobitsch
michael, thanks for your input.. i already extended the defaultCodec to return the BlockTreeOrdsPostingFormat for testing and this works nicely and i can access terms via ordinal. speed is not really the issue ( some things simply take a while... ;-) ) . i also don't want to index shingles, becau

Re: Dynamic Numeric Range Faceting

2017-03-10 Thread Michael McCandless
Hi Chitra, It sounds like things work for you in 6.4.1 but not in 4.10.4? Why not just upgrade to 6.4.x? DrillDownQuery is final because the class is not meant to be subclassed (it doesn't have any extensions points) and is really just "sugar" for rewriting to simpler queries. Mike McCandless

Re: codec: accessing term dictionary

2017-03-10 Thread Dawid Weiss
Or you could encode those term/ ngram frequencies one FST and then reuse it. This would be memory-saving and fairly fast (~comparable to a hash table). Dawid On Fri, Mar 10, 2017 at 11:41 AM, Michael McCandless wrote: > Yes, this is a reasonable way to use Lucene (to see terms statistics across

Re: codec: accessing term dictionary

2017-03-10 Thread Michael McCandless
Yes, this is a reasonable way to use Lucene (to see terms statistics across the corpus) but it may not be performant enough for your needs. E.g. wasting memory and making a giant hash table for one time or periodic corpus analysis may be faster. If you are looking for word N gram stats, you could

Re: Range queries get misinterpreted when parsed twice via the "Standard" parsers

2017-03-10 Thread Michael McCandless
Why don't we fix this in Lucene? It sounds like your fix (overriding toQueryString for the range query nodes) is contained? Could you open an issue and add a patch? I agree it's silly to produce [ts:X ts:Y] syntax. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 9, 2017 at 8:59 PM,