Hello,
I am searching exact phrase say "Jane Doe", there are two instances of this
in the text. My highlighter is only outputting the first instance and not
the second one. Can someone please help me understand the issue and how to
fix it, any help would be highly appreciated. Part of my code is
o be able to search stop words consider adding
> CharArraySet.EMPTY_SET to the StandardAnalyzer's initializer.
>
>
>
> -Original Message-
> From: Scott Selvia [mailto:ssel...@gmail.com]
> Sent: Wednesday, June 11, 2014 12:48 PM
> To: java-user@lucene.apache
7;s initializer.
-Original Message-
From: Scott Selvia [mailto:ssel...@gmail.com]
Sent: Wednesday, June 11, 2014 12:48 PM
To: java-user@lucene.apache.org
Subject: Exact Phrase Search returning in correct results
I'm having an issue searching for an exact phrase with Lucene 4.7. My
I’m having an issue searching for an exact phrase with Lucene 4.7. My use case
loaded the Declaration of Independence into
a Lucene search database. I search for “it becomes” and I get two hits; one
for “it, becomes” and another for a line that just has
“becomes” at the end of the line.
Expec
Hi Ahmet,
As per your suggestion I have posted the request with example on
Lucene-5205 jira ticket.
Thanks,
Modassar
On Wed, Mar 5, 2014 at 8:44 PM, Ahmet Arslan wrote:
> Hi Modassar,
>
> Can you post your request (with an example if possible) to lucene-5205
> jura ticket too? If you don't ha
Hi Modassar,
Can you post your request (with an example if possible) to lucene-5205 jura
ticket too? If you don't have an jira account, anyone can create one.
Thanks,
Ahmet
On Wednesday, March 5, 2014 9:40 AM, Modassar Ather
wrote:
Hi,
Phrases with stop words in them are not getting searc
Hi,
Phrases with stop words in them are not getting searched whereas a phrase
without it gets searched using ComplexPhraseQueryParser/SpanQueryParser.
SpanQueryParser reference: https://issues.apache.org/jira/browse/LUCENE-5205
The similar search works fine with classic parser which uses PhraseQ
rms = termList.toArray(new Term[0]);
multiPhrasequery.add(firstTerm);
multiPhrasequery.add(secondTerm);
org.hibernate.Query hibQuery =
fullTextSession.createFullTextQuery(
multiPhrasequery, this.type);
You'll have to be more specific about what you mean by "fuzzy phrase
search".
Even in the classic Lucene query parser "sloppy phrase search is
supported" - variable spacing between terms.
LUCENE-2754 added support for all multi-term queries (which includes
Fuzz
Did you find any solution for this.
I am looking for similar solution, please let me know if you found any useful
info regarding fuzzy phrase search inlucene.
Thanks & Regards,
Harish B.N.
Lead Software Engineer
Thomson Reuters
Phone: +91-80-67193219
Mobile: +91-9845807294
ha
arity"
function in Lucene.
Regards,
d
--
View this message in context:
http://lucene.472066.n3.nabble.com/multiple-phrase-search-for-topic-tp3461423p3474768.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
r.SHOULD);
>
> **
> thanks for the carrot2 pointer.
>
> -d
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/multiple-phrase-search-for-topic-tp3461423p3468005.html
> Sent from the Lucene - Java
on.LUCENE_33));
Query query = queryParser.parse(searchString);
bQuery.add(query,BooleanClause.Occur.SHOULD);
**
thanks for the carrot2 pointer.
-d
--
View this message in context:
http://lucene.472066.n3.nabble.com/multiple-phrase-search-for-topic-tp3461423p3
My questions are :
>
> 1) is there anything wrong in this usage of the phrase/boolean query?
> 2) how I can guarantee to retrieve the most suitable news documents (i.e.
> document which contains a lot of the related phrases) in the top searched
> results? I utilized the BooleanClause.Occur.SHO
all of
the 10k phrases, but using the SHOULD feature I surmise the best results
will be which contains at least a few of the phrases.
thanks in advance,
--d
--
View this message in context:
http://lucene.472066.n3.nabble.com/multiple-phrase-search-for-topic-tp3461423p3461423.html
Sent from
Hi Guys,
I am wondering how I can go about doing a Fuzzy Phrase search using
Lucene.NET 2.9.2 - I've tired looking around everywhere but there doesn't
really seem to be any resources related to this anywhere.
I found this stackoverflow
link<http://stackoverflow.com/questions/2589086
Hi, I have a requirement recently to implement fuzzy phrase, for example, in
the indexed document there is a sentence "I like lucene very much". And when
I search "I do like lucene very much" or "I like lucene much", I both want
to get the search result, can someone guide me how to implement this
f
SpanFirstQuery is the clean option. Another option is to add a "start
token" to each title. Then, search for "startToken oil spill". This
will be faster than SpanFirstQuery. But it also requires doing
something weird to the field.
Lance
On Thu, Jun 17, 2010 at 3:19 PM, Michael McCandless
wrote:
SpanFirstQuery?
Mike
On Thu, Jun 17, 2010 at 3:23 PM, rakesh rakesh wrote:
> Hi,
>
> I have thousands of article titles in lucene index. So for a query "Oil
> spill" I want to return all the article title starts with "Oil spill". I do
> not want those titles which has this phrase but do not star
Hi,
I have thousands of article titles in lucene index. So for a query "Oil
spill" I want to return all the article title starts with "Oil spill". I do
not want those titles which has this phrase but do not start with this.
Can anyone help me.
Thanks in advance
Thanks
rakesh
, March 04, 2010 8:54 AM
To: java-user@lucene.apache.org
Subject: Re: Phrase search on NOT_ANALYZED content
I'm still struggling with your overall goal here, but...
It sounds like what you're looking for is an exact match
in some cases but not others? In which case you could
think about in
Message-
> From: java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
> [mailto:java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
> ] On Behalf Of Erick Erickson
> Sent: Wednesday, March 03, 2010 4:30 PM
> To: java-user@lucene.apache.org
> Subject
em.
Thanks,
Paul
-Original Message-
From: java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Wednesday, March 03, 2010 4:30 PM
To: java-user@lucene.apache
s a better
way to accomplish your goal.
Best
Erick
On Wed, Mar 3, 2010 at 4:11 PM, Murdoch, Paul wrote:
> If I have indexed some content that contains some words and a single
> whitespace between each word as NOT_ANALYZED, is it possible to perform
> a phrase search on that a portion of
If I have indexed some content that contains some words and a single
whitespace between each word as NOT_ANALYZED, is it possible to perform
a phrase search on that a portion of that content? I'm indexing and
searching with the StandardAnalyzer 2.9. Using the KeywordAnalyzer
works, but I ha
ssage-
From: java-user-return-45156-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45156-paul.b.murdoch=saic@lucene.apache.org] On
Behalf Of Murdoch, Paul
Sent: Wednesday, February 24, 2010 5:11 PM
To: java-user@lucene.apache.org
Subject: RE: Phrase Search and NOT_ANA
:01 PM
To: java-user@lucene.apache.org
Subject: RE: Phrase Search and NOT_ANALYZED
Thanks,
I've been looking at that one too. I'm trying to make it happen with the
StandardAnalyzer. Unfortunately, I think I see some redesign for more
robustness in the future.
Cheers,
Paul
---
apache.org
[mailto:java-user-return-45154-paul.b.murdoch=saic@lucene.apache.org] On
Behalf Of Robert Muir
Sent: Wednesday, February 24, 2010 4:55 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase Search and NOT_ANALYZED
check out KeywordAnalyzer!
On Wed, Feb 24, 2010 at 4:51 PM, Mur
ead.
>
> Thanks,
>
> Paul
>
>
> -Original Message-
> From: java-user-return-45149-paul.b.murdoch=saic@lucene.apache.org
> [mailto:java-user-return-45149-paul.b.murdoch=saic@lucene.apache.org
> ] On Behalf Of Erick Erickson
> Sent: Wednesday, February 24, 20
=saic@lucene.apache.org
] On Behalf Of Digy
Sent: Wednesday, February 24, 2010 4:45 PM
To: java-user@lucene.apache.org
Subject: RE: Phrase Search and NOT_ANALYZED
Since it is not analyzed, your text is stored as a single term in the
index
[something in the index].
But the query
name:"someth
aul.b.murdoch=saic@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Wednesday, February 24, 2010 4:23 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase Search and NOT_ANALYZED
What does Luke's explain show you? That'll show you a lot about how
the query gets transformed
.@saic.com]
Sent: Wednesday, February 24, 2010 10:51 PM
To: java-user@lucene.apache.org
Subject: Phrase Search and NOT_ANALYZED
Hi,
I'm indexing a field using the StandardAnalyzer 2.9.
field = new Field(fieldName, fieldValue, Field.Store.YES,
Field.Index.NOT_ANALYZED);
Let's say fieldName
What does Luke's explain show you? That'll show you a lot about how
the query gets transformed..
My first guess is that stop words are messing you up
Erick
On Wed, Feb 24, 2010 at 3:51 PM, Murdoch, Paul wrote:
> Hi,
>
>
>
> I'm indexing a field using the StandardAnalyzer 2.9.
>
>
>
> fi
Hi,
I'm indexing a field using the StandardAnalyzer 2.9.
field = new Field(fieldName, fieldValue, Field.Store.YES,
Field.Index.NOT_ANALYZED);
Let's say fieldName is "name" and fieldValue is "something in the
index". When I perform the query...
name:"something in the index"
...
Hello,
You could use a PhraseQuery with the terms "cool" and "gaming" and
"computer" and set the slop factor you reckon is right. Then could assign a
boost to this query only, which will make it bubble up the list.
I don't think you can get away without specifying a slop factor though(like
in the
On Fri, Jun 5, 2009 at 21:31, Abhi wrote:
> Say I have indexed the following strings:
>
> 1. "cool gaming laptop"
> 2. "cool gaming lappy"
> 3. "gaming laptop cool"
>
> Now when I search with a query say "cool gaming computer", I want string 1
> and 2 to appear on top (where search terms are closer
Say I have indexed the following strings:
1. "cool gaming laptop"
2. "cool gaming lappy"
3. "gaming laptop cool"
Now when I search with a query say "cool gaming computer", I want string 1
and 2 to appear on top (where search terms are closer to each other)
followed by 3.
I can use a Term query t
(since I'm inflating the query). Does this make sense?
Itamar.
-Original Message-
From: Daniel Noll [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 20, 2008 12:44 AM
To: java-user@lucene.apache.org
Subject: Re: Contrib Highlighter and Phrase search
On Wednesday 19 March 2008 18:28:15 Ita
On Wednesday 19 March 2008 18:28:15 Itamar Syn-Hershko wrote:
> 1. Build a Radix tree (PATRICIA) and populate it with all search terms.
> Phrase queries will be considered as one big string, regardless their
> spaces.
>
> 2. Iterate through your text ignoring spaces and punctuation marks, and for
>
t color.
This allows for fast and exact highlighting of large texts as well as
smaller ones. I would love to hear any comments on the above.
Itamar.
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 18, 2008 10:51 PM
To: java-user@lucene.apache.org
Subject: R
See https://issues.apache.org/jira/browse/LUCENE-794
Spencer Tickner wrote:
Hi List,
Thanks in advance for any help. I'm working with the contrib
highlighting class and am having issues when doing searches with a
phrase. I've been able to duplicate this behaviour in the
HighlighterTest class.
You're going to want to change your TokenFilter so that it emits the split
pieces tokens immediately after the original token and with a
positionIncrement of "0" .. don't buffer then up and wait for the entire
stream to finish first.
it true order of the tokens in the tokenstream and the posit
Thanks, I'll give that a try.
Cheers,
Spencer
On Tue, Mar 18, 2008 at 1:50 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> The contrib Highlighter is not position sensitive. You can try out the
> patch I have been working here if you are interested:
> https://issues.apache.org/jira/browse/LUCENE-
The contrib Highlighter is not position sensitive. You can try out the
patch I have been working here if you are interested:
https://issues.apache.org/jira/browse/LUCENE-794
Spencer Tickner wrote:
Hi List,
Thanks in advance for any help. I'm working with the contrib
highlighting class and am
Hi List,
Thanks in advance for any help. I'm working with the contrib
highlighting class and am having issues when doing searches with a
phrase. I've been able to duplicate this behaviour in the
HighlighterTest class.
When calling the testGetBestFragmentsPhrase() method I get the correct:
John K
Hi, I have written a TokenFilter which breaks up words with internal dot
characters and adds the whole word plus the pieces as tokens in the stream. I
am using that TokenFilter with the StandardAnalyzer to index my documents. Then
I do searches using the StandardAnalyzer. Everything is working g
M, Spencer Tickner wrote:
>
> > Hi List,
> >
> > Thanks in advance for the help. I'm creating a simple searching test
> > based on Query Parser and from what I've read it should have no
> > problems with a Phrase Search. However I can't seem to get an
3:04 PM, Spencer Tickner wrote:
Hi List,
Thanks in advance for the help. I'm creating a simple searching test
based on Query Parser and from what I've read it should have no
problems with a Phrase Search. However I can't seem to get any results
back.
I'm doing a si
Hi List,
Thanks in advance for the help. I'm creating a simple searching test
based on Query Parser and from what I've read it should have no
problems with a Phrase Search. However I can't seem to get any results
back.
I'm doing a simple index using the StandardAnalyzer. Outp
Ok.. thanks, I have tried to index address field as UN_TOKENIZED and search
using above query, its return Nothing, How can I specified " NOT tokenize"
in query..
--Thanks,
On 6/18/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Phrase queries won't help you here
Your particular issue can be
: Another good old trick is to index field values (tokenized) with
: appended special starting and ending tokens, e.g. instead of "Hiran
: Magri" use "_start_ Hiran Magri _end_". Then you can query for fields
: that are exactly equal to a phrase, while still retaining the
: possibility to search b
Erick Erickson wrote:
Phrase queries won't help you here
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening i
Phrase queries won't help you here
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening is that StandardAnalyzer
Hello everyone,
I am lucene user and tried to implement pharse query, But now getting some
logical problems in searching..
My index have 4 fields: Name, Address & City and 6 docs.
i.e 1. "Laxmilal Menaria", "Hiran Magri", "Udaipur",
2. "Mohan Sharma", "Hiran Magri Sec 10", "Udaipur"
Ive proposed a simple improvement in issue
https://issues.apache.org/jira/browse/LUCENE-884
thanks Paul
Chris Hostetter wrote:
But as i said: if you have suggestions for clarifying the docs, please
submit them as a patch. just saying the docs need to be improved without
providing a specific
: queryparsersyntax page which is where I expect most novices (such as
: myself) start with lucene seems to indicate that wildcards can be used
: in, and this page is
: as far as one should need to go to understand basic query syntax, this
: page should be corrected.
if you have a suggestion for
Chris Hostetter wrote:
: > You can't use a wildcard within double quotes. The Lucene syntax
: > grammar does not look for such things.
: This is the bit I don't get (I have got round the problem), why can't
: you use wildcards within double quotes, this isnt mentioned anywhere in
: http://lucene
: > You can't use a wildcard within double quotes. The Lucene syntax
: > grammar does not look for such things.
: This is the bit I don't get (I have got round the problem), why can't
: you use wildcards within double quotes, this isnt mentioned anywhere in
: http://lucene.apache.org/java/docs/qu
I do not know enough about PhraseQuery to say how hard it would be to
add support for wildcards, but I am sure there is some method of doing
it -- it has just not been done. From what I can tell it would be easier
to stop using PhraseQuery and use SpanQuery's if you wanted to do this.
Maybe som
Mark Miller wrote:
You cannot use wildcards in quotes simply because the QueryParser
syntax does not look for such things...at the top level it is either
looking for a Wildcard token OR a Quoted token. There is good reason
for this: a phrase query does not support wildcards.
OK thanks for all t
You cannot use wildcards in quotes simply because the QueryParser syntax
does not look for such things...at the top level it is either looking
for a Wildcard token OR a Quoted token. There is good reason for this: a
phrase query does not support wildcards. The hack that I suggested
(looking for
Mark Miller wrote:
You can't use a wildcard within double quotes. The Lucene syntax
grammar does not look for such things.
This is the bit I don't get (I have got round the problem), why can't
you use wildcards within double quotes, this isnt mentioned anywhere in
http://lucene.apache.org/java
I think the KeywordAnlyser bit is maybe a red herring, the problem
seems to be that you cant use * within double quotes, I made some
changes to my data and index to remove the space character
You can't use a wildcard within double quotes. The Lucene syntax grammar
does not look for such thin
@lucene.apache.org
Subject: Re: Problem using wildcardsearch in phrase search
I think the KeywordAnlyser bit is maybe a red herring, the problem seems
to be that you cant use * within double quotes, I made some changes to
my data and index to remove the space character
If I fed 54:puid* to my code
I think the KeywordAnlyser bit is maybe a red herring, the problem seems
to be that you cant use * within double quotes, I made some changes to
my data and index to remove the space character
If I fed 54:puid* to my code it generates a Prefix Query and works as
required
Search Query Is54:puid
Perhaps not like whitespaceanalyzer does in all cases, but this code
QueryParser qp = new QueryParser("field", new
WhitespaceAnalyzer());
Query q = qp.parse("Does this tokenize*");
System.out.println(q.toString());
produces
field:Does field:this field:token
See below
On 5/12/07, Mark Miller <[EMAIL PROTECTED]> wrote:
Paul Taylor wrote:
> I seem to be having problems using a * in a phrase term query
>
> This is my search String, its not finding any matches
> 54:"MusicIP PUID*"
>
> If I match on a particular record it works ok
> 54:"MusicIP PU
This just keeps running around in my head...
I was wrong on one point...if you use the KeywordAnalyzer and you put
your search in quotes then you will not generate a phrase query because
a PhraseQuery is only generated if the analyzer produces more than one
token. The problem is that, instead
Well I am confused so I suppose I'll let someone else give it a shot.
Just in case though...if you are using the query: fieldname:"MusicIP Puid*"
Then you should not...you need to leave out the quotes...quotes create a
phrasequery, and a phrasequery will not match what is in your index.
This may
Mark Miller wrote:
Didn't you say you where using a phrasequery? If you are, things will
not work as expected. You need to leave the quotes out of your search
as a phrasequery will not match what you are putting in your index. If
you are not using a phrasequery then things should work as you wo
Paul Taylor wrote:
Mark Miller wrote:
"MusicIP PUID*" means to search for MusicIP within one of PUID*
Sorry I dont understand, can you give me a further reference
...I am pretty sure that KeywordAnalyzer does not split on whitespace
like WhiteSpaceAnalyzer does...which means that MusicIP is
Mark Miller wrote:
"MusicIP PUID*" means to search for MusicIP within one of PUID*
Sorry I dont understand, can you give me a further reference
...I am pretty sure that KeywordAnalyzer does not split on whitespace
like WhiteSpaceAnalyzer does...which means that MusicIP is never
within one of
Paul Taylor wrote:
I seem to be having problems using a * in a phrase term query
This is my search String, its not finding any matches
54:"MusicIP PUID*"
If I match on a particular record it works ok
54:"MusicIP PUIDa39494bf-927e-1638-fb06-782ec55ac22d"
"MusicIP PUID*" means to search for Mu
Somewhere in the list, I remember one of the guys who know what
they're talking about mentions something about KeywordAnalyzer
being "subject to the meta-semantics of the QueryParser".
So try looking at query.toString() in your example. What I think you'll
find is that KeywordAnalyzer doesn't qui
I seem to be having problems using a * in a phrase term query
This is my search String, its not finding any matches
54:"MusicIP PUID*"
If I match on a particular record it works ok
54:"MusicIP PUIDa39494bf-927e-1638-fb06-782ec55ac22d"
The problem appears to be the space character, because I hav
: Sorry for the confusion and thanks for taking the time to educate me. So, if
: I am just indexing literal values, what is the best way to do that (what
: analyzer)? Sounds like this approach, even though it works, is not the
: preferred method.
if you truely want just the literal values then
as seperate values
> : > intead of trying to find one big vlaue containing "my brown-cow red
> fox"
> : >
> : > : in the results if the case is identical to how it was added? (This
> : > seems to
> : > : be what I observe anyway. And whether I add as TOKENI
rom: Philip Brown <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Re: Phrase search using quotes -- special Tokenizer
:
:
: Here's a little sample program (borrowed some code from Erick Erickson :)).
: Whether I add as TOKENIZED or UN_
; 1) wether case matters is determined enitrely by your analyzer, if it
>produces differnet tokens for "Blue" and "BLUE" then case matters
> 2) use TOKENIZED or your Analyzer will be completely irrelevant
> 3) if you observse something working differently then you
okens indicate that you are going for a
phrase search. A phrase search is generated. A phrase search with stopwords
removed has interesting sloppy matching. A phrase search can also match out
of order given enough slop. This is normally fine behavior for most
applications I can think of. You need to con
: So, if I do as you suggest below (using PerFieldAnalyzerWrapper with
: StandardAnalyzer) then I still need to enclose in quotes the phrases
: (keywords with spaces) when I issue the search, and they are only returned
Yes, quotes will be neccessary to tell the QueryParser "this
is one chunk of t
-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Phrase-search-using-quotesspecial-Tokenizer-tf2200760.html#a6145591
Sent from the Lucene - Java Users forum at Nabble.com.
: Yeah, they are more complex than the "exactish" match -- basically, there are
: more fields involved -- combined sometimes with AND and sometimes with OR,
: and sometimes negated field values, sometimes groupings, etc. These other
: field values are all single words (no spaces), and a search mi
TermQuery out of it for the neccessary field.
: >
: > ...that's it. that's all she wrote -- don't even look in
QueryParser's
: > general direction, at all.
: >
: >
: >
: > -Hoss
: >
: >
: >
want.
> : > b) use this Analyzer when you add the fields to your documents, even
> : > though you don't want *real* tokenization, add make the field type
> : > TOKENIZED so your analyzer gets used.
> : > c) when you get some text input to serach on, pass it to the same
> : > Analyzer, take the Token you get back and manualy
e text input to serach on, pass it to the same
: > Analyzer, take the Token you get back and manualy construct a
: > TermQuery out of it for the neccessary field.
: >
: > ...that's it. that's all she wrote -- don't even look in QueryParser's
: > genera
ZED so your analyzer gets used.
: > c) when you get some text input to serach on, pass it to the same
: > Analyzer, take the Token you get back and manualy construct a
: > TermQuery out of it for the neccessary field.
: >
: > ...that's it. that's all she wrote -- don't even
> c) when you get some text input to serach on, pass it to the same
> Analyzer, take the Token you get back and manualy construct a
> TermQuery out of it for the neccessary field.
>
> ...that's it. that's all she wrote -- don't even look in QueryParser's
&
Yeah, what he said
On 9/3/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
I haven't really been following this thread, but it's gotten so long
i got interested.
from whta i can tell skimming the discussion so far, it seems like the
biggest confusion is about the definition of a "phrase" a
I haven't really been following this thread, but it's gotten so long
i got interested.
from whta i can tell skimming the discussion so far, it seems like the
biggest confusion is about the definition of a "phrase" and what analyzers
do with "quote" characters and what the QueryParser does with "q
> >> > - Mark
>> >>> >> >
>> >>> >> > On 9/1/06, Philip Brown <[EMAIL PROTECTED]> wrote:
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> Well, I tried that, and it doesn't seem
t;>> >> >> phrases. From
>>> http://lucene.apache.org/java/docs/api/index.html -->
>>> A
>>> >> >> Phrase is a group of words surrounded by double quotes such as
>>> "hello
>>> >> >> dolly". So, this should be easy, righ
t; the
>>> >> >> NUM
>>> >> >> > token but nothing I'd worry about. maybe you want to use Unicode
>>> for
>>> >> >>
| "." ( ".")+ >
>> >> >
>> >> > // company names like AT&T and [EMAIL PROTECTED]
>> >> > | ("&"|"@") >
>> >> >
>> >> > // email addresses
>> >> > |
7;" )+ >
>> >> >
>> >> > // acronyms: U.S.A., I.B.M., etc.
>> >> > // use a post-filter to remove dots
>> >> > | "." ( ".")+ >
>> >> >
>> >> > // company names like AT&T and [EMAIL PROTECTED]
>> >> > | ("&"|"@") >
>> >> >
>> >> > //
t;_"|"-"|"/"|"."|",") >
| <#HAS_DIGIT: // at least one digit
(|)*
(|)*
>
| < #ALPHA: ()+>
| < #LETTER: // unicode letters
[
"\u0041"-"\u005a",
"\u0061"-"\u007a",
"\u00c0"-"
ode letters
[
"\u0041"-"\u005a",
"\u0061"-"\u007a",
"\u00c0"-"\u00d6",
"\u00d8"-"\u00f6",
"\u00f8"-"\u00ff",
"\u0100"-"\u1fff",
&qu
, etc.
>> >> > // use a post-filter to remove dots
>> >> > | "." ( ".")+ >
>> >> >
>> >> > // company names like AT&T and [EMAIL PROTECTED]
>> >> > | ("&"|"@") >
>> >> >
>> >> > // email addresses
>> >> > | (("."|"-"|"_") )* "@
|
>> > | ( )+
>> >| ( )+
>> >|( )+
>> >|( )+
>> > )
>> > >
>> > | <#P: ("_"|"-"|"/"|"."|",") >
>> > | &
s, etc.
>> > // every other segment must have at least one digit
>> > |
>> >|
>> > | ( )+
>> >| ( )+
>> >|( )+
>> >|( )+
>> >
1 - 100 of 113 matches
Mail list logo