Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread Vitaly Funstein
Okay, created LUCENE-5931 for this. As it turns out, my original test actually does do deletes on the index so please disregard my question about segment merging. On Tue, Sep 9, 2014 at 3:00 PM, wrote: > I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you > could think of i

Re: Arabic Stemmer problem

2014-09-09 Thread atawfik
Hi Suleman, It is not a bug, it is the intended behavior. In fact, your examples are correct. It is just the daily usage for these words has changed recently. For instance, "سيار" means actually something that moves or walks. Since people use mobile everywhere, the word now means mobile. That

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread vfunstein
I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you could think of in the meantime? As you probably remember, the reason for doing this in the first place was to prevent the catastrophic heap exhaustion when SegmentReader instances are opened from scratch for every new Index

Arabic Stemmer problem

2014-09-09 Thread Suleman Mubarik
Hi I am working on using Arabic Stemmer https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ar/ArabicStemmer.html in suffixes there is a character THE_MARBUTA (\u0629) when this Stemmer applies stemSuffix it will remove THE_MARBUTA(ة) which will change some words for example

Re: source for jira lucene facet example

2014-09-09 Thread Michael McCandless
The source code is being tracked on this branch/issue: https://issues.apache.org/jira/browse/LUCENE-5376 But I haven't had any time to push that branch forward for quite a while ... I still think it's important that Lucene have a simple example server though. Mike McCandless http://blog.mikemcca

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread Michael McCandless
Hmm, which Lucene version are you using? We recently beefed up the checking in this code, so you ought to be hitting an exception in newer versions. But that being said, I think the bug is real: if you try to reopen from a newer NRT reader down to an older (commit point) reader then you can hit t

RE: How to properly correlate relevance in a search across multiple collections

2014-09-09 Thread Baldwin, David
I did notice the MultiSearch and MultiReader, given the advertisement on the lucene feature page that " multiple-index searching with merged results" (See https://lucene.apache.org/core/) I am wondering if my original question about searching multiple indexes with merged results also includes pr

source for jira lucene facet example

2014-09-09 Thread Vincent Sevel
Hi, Does someone know if the source of the jira issues search example is available: http://jirasearch.mikemccandless.com/ thanks, vince Vincent Sevel Banque Lombard Odier & Cie SA 11, rue de la Corraterie - 1204 Genève - Suisse T +41 22 709 3376 - F +41 22 709 3782 www.lombardodier.com

RE: KeywordAnalyzer still getting tokenized on spaces

2014-09-09 Thread Milind
I simplified the program to show this. I actually use a multiterm query parser and a join query across 2 Lucene Indexes. It's already complicated. I can understand the logic of parsing the query first (I need that in fact because I'm using different analyzers for different fields), but I don't un

RE: KeywordAnalyzer still getting tokenized on spaces

2014-09-09 Thread Uwe Schindler
Hi, the QueryParser does not analyze the whole query text with the analyzer. It first parses the query syntax and then only passes those parts through the analyzer, which are considered as "tokens" by the query parser. If you want such an analyzer be respected by the query parser you may need a

Re: KeywordAnalyzer still getting tokenized on spaces

2014-09-09 Thread atawfik
The result of QueryParser is confusing. The problem is that you assume the query parser uses the analyzer to parse your query. However, that is not the case. The query parser first parses the query string, then applies the analyzer. In other words, the query parser will split the query string usin

RE: How to properly correlate relevance in a search across multiple collections

2014-09-09 Thread Vincent Sevel
Hi, Does someone know if the source of the jira issues search example is available: http://jirasearch.mikemccandless.com/ thanks, vince DISCLAIMER This message is intended only for use by the person to whom it is addressed. It may contain informa