));
IndexWriter writer = new IndexWriter(dir, iwc);
Anything suspicious here?
Thanks
Ilya Zavorin
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Monday, March 26, 2012 1:48 PM
To: java-user@lucene.apache.org
Subject: RE: can't find common
On 3/26/2012 at 12:21 PM, Ilya Zavorin wrote:
> I am not seeing anything suspicious. Here's what I see in the HEX:
>
> "n.e" from "pain.electricity": 6E-2E-0D-0A-0D-0A-65
> (n-.-CR-LF-CR-LF-e) "e.H" from "sentence.He": 65-2E-0D-0A-48
I agree, standard DOS/Windows line endings.
> I am pretty sure
anks,
Ilya
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Monday, March 26, 2012 11:41 AM
To: java-user@lucene.apache.org
Subject: RE: can't find common words -- using Lucene 3.4.0
Ilya,
StandardAnalyzer treats all forms of newline as whitespace,
orin [mailto:izavo...@caci.com]
Sent: Monday, March 26, 2012 11:21 AM
To: java-user@lucene.apache.org
Subject: RE: can't find common words -- using Lucene 3.4.0
Steve,
Thanks much for the link: very useful!
I looked at the index and found that it contains terms like
electricitythis -- from D
analyzers for respective foreign texts
Thanks,
Ilya
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Monday, March 26, 2012 10:59 AM
To: java-user@lucene.apache.org
Subject: RE: can't find common words -- using Lucene 3.4.0
Hi Ilya,
What analyzers are you
Hi Ilya,
What analyzers are you using at index-time and query-time?
My guess is that you're using an analyzer that includes punctuation in the
tokens it emits, in which case your index will have things like "sentence." and
"sentence?" in it, so querying for "sentence" will not match.
Luke can