M
To: java-user@lucene.apache.org
Subject: Tokenization / Analyzer question
I'm using lucene 2.9.1.
I'm indexing documents which correspond to an ID.
Each field in the ID document is made up of data from all subId's.
(It's a requirement that searches must work across all subId's
I'm using lucene 2.9.1.
I'm indexing documents which correspond to an ID.
Each field in the ID document is made up of data from all subId's.
(It's a requirement that searches must work across all subId's within an
ID).
They will be indexed and stored in some format similar to:
subId0Value0 subId0
> > It does sound very strange to me, to default to a
> WildCardQuery! Suppose I
> > am looking for "bold", I am getting hits for "old".
>
> I know - but that's what the requirements dictate. A better
> example might be
> a MAC or IP address, where someone might be searching for a
> string in
>
> It does sound very strange to me, to default to a WildCardQuery! Suppose I
> am looking for "bold", I am getting hits for "old".
I know - but that's what the requirements dictate. A better example might be
a MAC or IP address, where someone might be searching for a string in the
middle - like,
Hello,
> Hi everyone,
>
> I told you I'd be back with more questions! :-)
> Here is my situation. In my application, the field to be searched is
> selected via a drop-down box. I want my searches to basically
> be "contains"
> searches - I take what the user typed in, put a wildcard
> characte
Hi everyone,
I told you I'd be back with more questions! :-)
Here is my situation. In my application, the field to be searched is
selected via a drop-down box. I want my searches to basically be "contains"
searches - I take what the user typed in, put a wildcard character at the
beginning and end
Thanks Jeff. :)
--
View this message in context:
http://www.nabble.com/Analyzer+question-t1650271.html#a4524125
Sent from the Lucene - Java Users forum at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
eads-on on how/where to
start.
Thanks in advance.
--
View this message in context:
http://www.nabble.com/Analyzer+question-t1650271.html#a4469840
Sent from the Lucene - Java Users forum at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
in advance.
--
View this message in context:
http://www.nabble.com/Analyzer+question-t1650271.html#a4469840
Sent from the Lucene - Java Users forum at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Monday 22 August 2005 22:46, Dan Armbrust wrote:
> Cool - is there a daily build somewhere, or do I have to roll my own? I
> couldn't find a daily build or a 1.9 alpha, beta, etc. on the site.
You need to get it from SVN and then build it yourself.
> Any idea when 1.9 might be released, even
Daniel Naber wrote:
Correct handling of multiple terms per position was only added to SVN, it's
not part of Lucene 1.4.3.
Regards
Daniel
Cool - is there a daily build somewhere, or do I have to roll my own? I
couldn't find a daily build or a 1.9 alpha, beta, etc. on the site.
Any idea whe
On Monday 22 August 2005 21:54, Dan Armbrust wrote:
> The problem I am having now is that the QueryParser seems to ignore the
> positionIncrement values.
Correct handling of multiple terms per position was only added to SVN, it's
not part of Lucene 1.4.3.
Regards
Daniel
--
http://www.danieln
I have a custom Analyzer which performs normalization on all of the
terms as they pass through.
It does normalization like the following:
trees -> tree
Sometimes my normalizer returns multiple words for a normalization - for
example:
leaves -> leaf leave
The second and all subsequent terms
On Aug 8, 2005, at 10:43 AM, Dan Armbrust wrote:
It is my understanding that the StandardAnalyzer will remove
underscores - so "some_word" be indexed as 'some' and 'word'.
I want to keep the underscores, so I was thinking of changing over
to an Analyzer that uses the WhiteSpaceTokenizer, Low
It is my understanding that the StandardAnalyzer will remove underscores
- so "some_word" be indexed as 'some' and 'word'.
I want to keep the underscores, so I was thinking of changing over to an
Analyzer that uses the WhiteSpaceTokenizer, LowerCaseFilter, and StopFilter.
What other tokenizin
15 matches
Mail list logo