Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-14 Thread Paul Taylor
Paul Taylor wrote: Koji Sekiguchi wrote: Koji Sekiguchi wrote: Paul Taylor wrote: I want my search to treat 'No. 1' and 'No.1' the same, because in our context its one token I want 'No. 1' to become 'No.1', I need to do this before tokenizing because the tokenizer would split one value into

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-14 Thread Paul Taylor
Koji Sekiguchi wrote: Koji Sekiguchi wrote: Paul Taylor wrote: I want my search to treat 'No. 1' and 'No.1' the same, because in our context its one token I want 'No. 1' to become 'No.1', I need to do this before tokenizing because the tokenizer would split one value into two terms and one i

Re: Hopfully simple question constructing BooleanQuery

2009-12-14 Thread Jacob Rhoden
Exactly what I was trying to do. Thanks. On 15/12/2009, at 4:25 PM, TCK wrote: How about the following? BooleanQuery bq1 = new BooleanQuery(); bq1.add(new PrefixQuery(new Term("heading",word)),BooleanClause.Occur.SHOULD); bq1.add(new PrefixQuery(new Term("attribute",word)),BooleanClause.Occur.

Re: Hopfully simple question constructing BooleanQuery

2009-12-14 Thread TCK
How about the following? BooleanQuery bq1 = new BooleanQuery(); bq1.add(new PrefixQuery(new Term("heading",word)),BooleanClause.Occur.SHOULD); bq1.add(new PrefixQuery(new Term("attribute",word)),BooleanClause.Occur.SHOULD); BooleanQuery bq = new BooleanQuery(); bq.add(bq1, BooleanClause.Occur.MUS

Hopfully simple question constructing BooleanQuery

2009-12-14 Thread Jacob Rhoden
Assume I have the following rather simple example that works fine: BooleanQuery bq = new BooleanQuery(); bq.add(new PrefixQuery(new Term("heading",word)),BooleanClause.Occur.SHOULD); bq.add(new PrefixQuery(new Term("attribute",word)),BooleanClause.Occur.SHOULD); Now I add the foll

Document category identification in query

2009-12-14 Thread Alex
Hi, I am trying to expand user queries to figure out potential document categories implied in the query. I wanted to know what was the best way to figure out the document category that is the most relevant to the query. Let me explain further: I have created categories that are applied to documen

RE: Lower/Uppercase problem when searching in a not-analyzed field

2009-12-14 Thread Jeff Plater
The issue is that you are using an analyzer on the search query and not at index time. The StandardAnalyzer that you are using at search time is lowercasing the query before searching against the index. You have a few options that I can think of: 1 - use a different analyzer at search time (o

Re: Lower/Uppercase problem when searching in a not-analyzed field

2009-12-14 Thread Savvas-Andreas Moysidis
Hi, my guess would also be that the StandardAnalyzer lowercases your terms while you have indexed them as they are without lowercasing. One idea would be to use the PerFieldAnalyzerWrapper and map a KeywordAnalyzer (which basically doesn't tokenise your stream at all) to any fields you want not an

Lower/Uppercase problem when searching in a not-analyzed field

2009-12-14 Thread Michel Nadeau
Hi ! My Lucene 3.0.0 index contains a field "DOMAIN" that contains an Internet domain name - like * www.DomainName.com * www.domainname.com * www.DomainName.com/path/to/document/doc.html?a=2 This field is indexed like this - doc.add(new Field("DOMAIN", sValue, Field.Store.YES, Field.Index.NOT_A

RE: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Uwe Schindler
Can you open an issue? This is a problem in SnowballAnalyzer missing to add the set ctor. In 2.9.1 you can only use it with the deprecated array and in 3.0 there it breaks :-) But you can always copy the set into an array using size() and Iterable. - Uwe Schindler H.-H.-Meier-Allee 63, D-2821

Re: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Erick Erickson
Hmmm, that does seem a little confusing... Have you tried using ENGLISH_STOP_WORDS_SET.toArray()? WARNING: I haven't tried this myself and don't have access to a test bed Erick On Mon, Dec 14, 2009 at 7:55 AM, Nick Burch wrote: > Hi All > > I'm upgrading my code from 2.4 to 2.9, and I've h

Re: How to do ranking in lucene?

2009-12-14 Thread Erick Erickson
I really don't understand the question. Ranking is what Lucene *does*. Here is a link explaining how Lucene ranks documents: http://lucene.apache.org/java/2_4_0/scoring.html If this isn't relevant, you need to explain what about the current ranking

NGramTokenizer stops working after about 1000 terms

2009-12-14 Thread Stefan Trcek
Hello For a source code (git repo) search engine I choose to use an ngram analyzer for substring search (something like "git blame"). This worked fine except it didn't find some strings. I tracked it down to the analyzer. When the ngram analyzer yielded about 1000 terms it stopped yielding mor

Re: heap memory issues when sorting by a string field

2009-12-14 Thread Toke Eskildsen
On Fri, 2009-12-11 at 14:53 +0100, Michael McCandless wrote: > How long does Lucene take to build the ords for the toplevel reader? > > You should be able to just time FieldCache.getStringIndex(topLevelReader). > > I think your 8.5 seconds for first Lucene search was with the > StringIndex compute

Re: Offset Problem

2009-12-14 Thread Weiwei Wang
got it, thanks, Koji On Mon, Dec 14, 2009 at 9:19 PM, Koji Sekiguchi wrote: > Weiwei Wang wrote: > >> The offset is incorrect for PatternReplaceCharFilter so the hilighting >> result is wrong. >> >> How to fix it? >> >> >> > As I noted in the comment of the source, if you produce a phrase from a

Re: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Weiwei Wang
you can construct your own analyser as SnowballAnalyzer does and as a result you can adapt to the new Lucene version(now 3.0.0) http://lucene.apache.org/java/3_0_0/api/contrib-snowball/org/apache/lucene/analysis/snowball/SnowballAnalyzer.html On Mon, Dec 14, 2009 at 8:55 PM, Nick Burch wrote: >

Re: Offset Problem

2009-12-14 Thread Koji Sekiguchi
Weiwei Wang wrote: The offset is incorrect for PatternReplaceCharFilter so the hilighting result is wrong. How to fix it? As I noted in the comment of the source, if you produce a phrase from a term and try to highlight a term in the produced phrase, the highlighted snippet will be undesira

SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Nick Burch
Hi All I'm upgrading my code from 2.4 to 2.9, and I've hit an issue with deprecations. My old code was: new SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS); Looking at the JavaDocs, I'd expected that the new format would be: new SnowballAnalyzer(Version.LUCENE_CUR