Hello,
I have some questions about what kind of behavior is expected when passing
Version.LUCENE_24/29/30 to QueryParser and the StandardAnalyzer when parsing
a query. I know that passing the Version to the constructors make Lucene
act that like version, with all features and bugs intact. The be
You could look in to modifying the standard tokenizer lexer code to
handle punctuation (there is a patch in the isssue tracker for the old
javacc grammer to handle punctuation) and there is also the Gate NLP
project which has a fairly nice sentence splitter you might find
useful. Add a whol
I would like my searches to match "John Smith" when John Smith is in a
document, but not separated with punctuation. For example, when I was using
StandardAnalyzer, "John. Smith" was matching, which is wrong for me. Right
now I am using WhitespaceAnalyzer but instead searching for "John Smith"
"J
On Fri, Jul 31, 2009 at 5:00 PM, wrote:
> Hi Ahmet,
>
> Thanks for the clarification and information! That was exactly what I was
> looking for.
>
> Jim
>
>
> AHMET ARSLAN wrote:
>>
>> > I guess that the obvious question is "Which characters are
>> > considered 'punctuation characters'?".
Hi Ahmet,
Thanks for the clarification and information! That was exactly what I was
looking for.
Jim
AHMET ARSLAN wrote:
>
> > I guess that the obvious question is "Which characters are
> > considered 'punctuation characters'?".
>
> Punctuation = ("_"|"-"|"/"|"."|",")
>
> > In part
> I guess that the obvious question is "Which characters are
> considered 'punctuation characters'?".
Punctuation = ("_"|"-"|"/"|"."|",")
> In particular, does the analyzer consider "=" (equal) and
> ":" (colon) to be punctuation characters?
":" is special character at QueryParser (if you are
Phil Whelan wrote:
> On Thu, Jul 30, 2009 at 7:12 PM, wrote:
> > I was wonder if there is a list of special characters for the standard
> > analyzer?
> >
> > What I mean by "special" is characters that the analyzer considers break
> > chara
On Thu, Jul 30, 2009 at 7:12 PM, wrote:
> I was wonder if there is a list of special characters for the standard
> analyzer?
>
> What I mean by "special" is characters that the analyzer considers break
> characters.
> For example, if I have something like &qu
Hi,
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by "special" is characters that the analyzer considers break
characters. For example, if I have something like "foo=something", apparently
the analyzer considers this as
25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana:
Hi,
Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.
Then you simply add a LowercaseFilter to the chain in the Analyzer:
public final class WhitespaceAnalyzer extends Analyzer {
public TokenStream tokenStream(String fi
Hi,
Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.
If I need to search for words like "correct?", "" (it escapes <, > and
another few characters too) I need to index those kind of words.
On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
>
> 25 aug 200
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana:
Hi,
I am using StandardAnalyzer when creating the Lucene index. It
indexes the
word "wo&rk" as it is but does not index the word "wo*rk" in that
manner.
Can I index such words (including * and ?) as it is? Otherwise I
have no way
to ind
AUTOMATIC REPLY
Tom Roberts is out of the office till 2nd September 2008.
LUX reopens on 1st September 2008
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
AUTOMATIC REPLY
Tom Roberts is out of the office till 2nd September 2008.
LUX reopens on 1st September 2008
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi,
I am using StandardAnalyzer when creating the Lucene index. It indexes the
word "wo&rk" as it is but does not index the word "wo*rk" in that manner.
Can I index such words (including * and ?) as it is? Otherwise I have no way
to index and search for words like "wo*rk", you?, etc.
Thanks
--
K
: Yes the version of lucene and java are exactly the same on the different
: machines.
: Infact we unjared lucene and jared it with our jar and are running from the
: same nfs mounts on both the machines
i didn't do an indepth code read, but a quick skim of
StandardTokenizerImpl didn't turn up a
Hi Prashant,
What is the Unicode code point associated with the 3,4,5 character?
Steve
On 04/22/2008 at 4:45 PM, Prashant Malik wrote:
> Yes the version of lucene and java are exactly the same on
> the different
> machines.
> Infact we unjared lucene and jared it with our jar and are
> running f
Yes the version of lucene and java are exactly the same on the different
machines.
Infact we unjared lucene and jared it with our jar and are running from the
same nfs mounts on both the machines
Also we have tried with lucene2.2.0 and 2.3.1. with the same result .
also about the actual string u
Hi Prashant,
On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
> We have been observing the following problem while
> tokenizing using lucene's StandardAnalyzer. Tokens that we get is
> different on different machines. I am suspecting it has something to do
> with the Locale settings on individu
HI ,
We have been observing the following problem while tokenizing using
lucene's StandardAnalyzer. Tokens that we get is different on different
machines. I am suspecting it has something to do with the Locale settings on
individual machines?
For example
the word 'CÃ(c)sar' is split as 'CÃ
This is certainly the case. StandardAnalyzer has a regex matcher that
looks for a possible company name involving an & or an @. The
QueryParser is escaping the '&' -- all of the affects described are
standard results of using the StandardAnalzyer. Any double '&&' will
break text, but 'sdfdf&dfs
I just tried some things fast via the Solr admin interface, and
everything seems fine.
I think you are probably confusing what the parser does vs what the
analyzer does.
Try your tests with an un-tokenized field to remove that effect.
-Yonik
On 7/13/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote
In reading the documentation for escape characters, I'm having a
little trouble understanding what it wants me to do for certain
special cases.
http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters
says: "Lucene supports escaping special characters that are par
: Apologies and thanks all at the same time, everyone.
No apologies neccessary, you're not the first person to be confused by
this, which is why I asked if you had any ideas on how we can improve hte
docs -- people who know the APIs inside and out aren't in the best
position to understand how to
Apologies and thanks all at the same time, everyone.
Mike
On 4/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: Michael Barbarelli wrote:
: > Can I instantiate a standard analyzer with an argument containing my
own
: > stop words? If so, how? Will they be appended to or
: Michael Barbarelli wrote:
: > Can I instantiate a standard analyzer with an argument containing my own
: > stop words? If so, how? Will they be appended to or override the built-in
I'm relly suprised how often this question gets asked ... Michael (or
anyone else for that matter)
Michael Barbarelli wrote:
Can I instantiate a standard analyzer with an argument containing my own
stop words? If so, how? Will they be appended to or override the built-in
stop words?
You can do it with one of the alternate constructors, and they'll
override the build-in
I know this is a relatively fundamental thing to arrange, but I'm having
trouble.
Can I instantiate a standard analyzer with an argument containing my own
stop words? If so, how? Will they be appended to or override the built-in
stop words?
Or, do I have to modify the analyzer class i
Thank you so much. I apologize for my ignorance.
Mark
On 7/7/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: > But ParseException extends IOException, so I don't see a problem
there.
: I wish my compiler agreed with you:) Which it seems to do until you
: rebuild the files with javacc. I saw
: > But ParseException extends IOException, so I don't see a problem there.
: I wish my compiler agreed with you:) Which it seems to do until you
: rebuild the files with javacc. I saw at least two other posts about this
: problem on the web with no answer given...
: This guy also found the same
Daniel Naber wrote:
On Freitag 07 Juli 2006 16:20, Mark Miller wrote:
the javacc generated StandardTokenizer next() method is declared to
throw a ParseException
final public org.apache.lucene.analysis.Token next() throws
ParseException, IOException {
unfortunately, org.apache.lucene.anal
On Freitag 07 Juli 2006 16:20, Mark Miller wrote:
> the javacc generated StandardTokenizer next() method is declared to
> throw a ParseException
>
> final public org.apache.lucene.analysis.Token next() throws
> ParseException, IOException {
>
> unfortunately, org.apache.lucene.analysis.Token nex
I have added support for sent/para prox search by modifying the notspan
query. In doing so I have changed the standard analyzer javacc .jj file.
Here is my problem:
the javacc generated StandardTokenizer next() method is declared to throw a
ParseException
final public
ECTED]
Sent: Monday, September 26, 2005 3:07 PM
To: java-user@lucene.apache.org
Subject: RE: Problems in standard Analyzer
The problem is that in limo you can only use standard analyzers for your
queries. As you've already seen some of them will change the key value to
something else or even r
nge it for your needs (e.g. add an option for no analyzer).
Frank
-Original Message-
From: "M å n i s h " [mailto:[EMAIL PROTECTED]
Sent: Monday, September 26, 2005 9:42 AM
To: java-user@lucene.apache.org
Subject: RE: Problems in standard Analyzer
Actually in Index I can se
nk [mailto:[EMAIL PROTECTED]
> Sent: Monday, September 26, 2005 1:07 PM
> To: java-user@lucene.apache.org
> Subject: RE: Problems in standard Analyzer
>
> It should be possible to combine queries using different types of
> analyzers.
> The only problem I can see is if you
[mailto:[EMAIL PROTECTED]
Sent: Monday, September 26, 2005 1:07 PM
To: java-user@lucene.apache.org
Subject: RE: Problems in standard Analyzer
It should be possible to combine queries using different types of analyzers.
The only problem I can see is if you're using one single line for the
:05 AM
To: java-user@lucene.apache.org
Subject: RE: Problems in standard Analyzer
I thought of not using any Analyzer, but the problem is I got other queries
that I am appending to this value with either OR or AND, so for that part of
query I need Standard Analyzer ,
I think I should index that va
I thought of not using any Analyzer, but the problem is I got other queries
that I am appending to this value with either OR or AND, so for that part of
query I need Standard Analyzer ,
I think I should index that value like normal text, then may be it will
work.
-Original Message
TED]
Sent: Monday, September 26, 2005 5:46 AM
To: java-user@lucene.apache.org
Subject: Problems in standard Analyzer
Hi Mark and other Gurus,
I am indexing one value as a key field (rtf & txt indexing) , value is like
12345 or 123-09-34 or it can be like MN12345.
Problem is if the value is like 1
Hi Mark and other Gurus,
I am indexing one value as a key field (rtf & txt indexing) , value is like
12345 or 123-09-34 or it can be like MN12345.
Problem is if the value is like 12345 or 123-23-98 , Standard Analyzer is
able to search it, but if the value is like MN12345 search will not re
41 matches
Mail list logo