Re: OutOfMemoryError indexing large documents

2014-11-25 Thread Erick Erickson
Well 1> don't send 20 docs at once. Or send docs over some size N by themselves. 2> seriously consider the utility of indexing a 100+M file. Assuming it's mostly text, lots and lots and lots of queries will match it, and it'll score pretty low due to length normalization. And you probably can't re

OutOfMemoryError indexing large documents

2014-11-25 Thread ryanb
Hello, We use vanilla Lucene 4.9.0 in a 64 bit Linux OS. We sometimes need to index large documents (100+ MB), but this results in extremely high memory usage, to the point of OutOfMemoryError even with 17GB of heap. We allow up to 20 documents to be indexed simultaneously, but the text to be anal

Re: Retrieve found terms

2014-11-25 Thread Michael Sokolov
Why don't you want to use a highlighter? That's what they're for. -Mike On 11/25/2014 09:12 AM, John Cecere wrote: I've done a bunch of searching, but I still can't seem to figure out how to do this. Given a WildcardQuery or PrefixQuery (something with a wildcard in it), is there a way to r

Retrieve found terms

2014-11-25 Thread John Cecere
I've done a bunch of searching, but I still can't seem to figure out how to do this. Given a WildcardQuery or PrefixQuery (something with a wildcard in it), is there a way to retrieve the terms in the index that matched in a document? For example, the search term for my WildcardQuery is 'arch*'

Re: hierarchical facets

2014-11-25 Thread Shai Erera
Yes, hierarchical faceting in Lucene is only supported by the taxonomy index, at least currently. Shai On Tue, Nov 25, 2014 at 3:46 PM, Vincent Sevel wrote: > hi, > I saw that SortedSetDocValuesFacetCounts does not support hierarchical > facets. > Is that to say that hierarchical facets are onl

hierarchical facets

2014-11-25 Thread Vincent Sevel
hi, I saw that SortedSetDocValuesFacetCounts does not support hierarchical facets. Is that to say that hierarchical facets are only supported through the Taxonomy index? I am using lucene 4.7.2. Regards, vince DISCLAIMER This message is intended

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hey Michael, Thanks for your reply. My use case is a little different. I would like to get the original values in facet queries but I would like to apply filter queries in a case insensitive fashion. For example I require facet_query to return Quick, The, brown, ... But I want filter queries of

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Michael Sokolov
The index size will not increase as quickly as you might think, and is not an issue in most cases. An alternative to two fields, though, is to index both upper- and lower-case tokens at the same position in a single field, and then to perform no case folding at query time. There is no standar

Re: Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hii Ahmet, Thanks for your reply. Creating two separate fields is a viable solution where one contains the original value and the other contains the lowercased value. But this leads to index bloat up. (~ 2x) I am looking for any other alternative solutions. -- Regards, Apurv Verma On Tue, Nov

Case Insensitive Matching in Solr/Lucene

2014-11-25 Thread Apurv Verma
Hey all, The standard solution to doing a case-insensitive match in lucene is to use a Lowercase filter at index and query time. However this does not preserve the content of the original document. For example if my inverted index is. Term Doc_1 Doc_2 - Quick |