Expected Behavior from QueryParser and Standard Analyzer with Version.LUCENE_*

2011-05-09 Thread Chris Currens
Hello, I have some questions about what kind of behavior is expected when passing Version.LUCENE_24/29/30 to QueryParser and the StandardAnalyzer when parsing a query. I know that passing the Version to the constructors make Lucene act that like version, with all features and bugs intact. The be

Re: Whitespace/Standard Analyzer and punctuation

2009-09-30 Thread Karl Wettin
You could look in to modifying the standard tokenizer lexer code to handle punctuation (there is a patch in the isssue tracker for the old javacc grammer to handle punctuation) and there is also the Gate NLP project which has a fairly nice sentence splitter you might find useful. Add a whol

Whitespace/Standard Analyzer and punctuation

2009-09-29 Thread Max Lynch
I would like my searches to match "John Smith" when John Smith is in a document, but not separated with punctuation. For example, when I was using StandardAnalyzer, "John. Smith" was matching, which is wrong for me. Right now I am using WhitespaceAnalyzer but instead searching for "John Smith" "J

Re: Is there a list of "special" characters for standard analyzer?

2009-07-31 Thread Simon Willnauer
On Fri, Jul 31, 2009 at 5:00 PM, wrote: > Hi Ahmet, > > Thanks for the clarification and information!  That was exactly what I was > looking for. > > Jim > > > AHMET ARSLAN wrote: >> >> > I guess that the obvious question is "Which characters are >> > considered 'punctuation characters'?".

Re: Is there a list of "special" characters for standard analyzer?

2009-07-31 Thread ohaya
Hi Ahmet, Thanks for the clarification and information! That was exactly what I was looking for. Jim AHMET ARSLAN wrote: > > > I guess that the obvious question is "Which characters are > > considered 'punctuation characters'?". > > Punctuation = ("_"|"-"|"/"|"."|",") > > > In part

Re: Is there a list of "special" characters for standard analyzer?

2009-07-31 Thread AHMET ARSLAN
> I guess that the obvious question is "Which characters are > considered 'punctuation characters'?". Punctuation = ("_"|"-"|"/"|"."|",") > In particular, does the analyzer consider "=" (equal) and > ":" (colon) to be punctuation characters? ":" is special character at QueryParser (if you are

Re: Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread ohaya
Phil Whelan wrote: > On Thu, Jul 30, 2009 at 7:12 PM, wrote: > > I was wonder if there is a list of special characters for the standard > > analyzer? > > > > What I mean by "special" is characters that the analyzer considers break > > chara

Re: Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread Phil Whelan
On Thu, Jul 30, 2009 at 7:12 PM, wrote: > I was wonder if there is a list of special characters for the standard > analyzer? > > What I mean by "special" is characters that the analyzer considers break > characters. > For example, if I have something like &qu

Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread ohaya
Hi, I was wonder if there is a list of special characters for the standard analyzer? What I mean by "special" is characters that the analyzer considers break characters. For example, if I have something like "foo=something", apparently the analyzer considers this as

Re: Standard Analyzer

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana: Hi, Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive. Then you simply add a LowercaseFilter to the chain in the Analyzer: public final class WhitespaceAnalyzer extends Analyzer { public TokenStream tokenStream(String fi

Re: Standard Analyzer

2008-08-25 Thread Kalani Ruwanpathirana
Hi, Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive. If I need to search for words like "correct?", "" (it escapes <, > and another few characters too) I need to index those kind of words. On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 25 aug 200

Re: Standard Analyzer

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana: Hi, I am using StandardAnalyzer when creating the Lucene index. It indexes the word "wo&rk" as it is but does not index the word "wo*rk" in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no way to ind

Re: Re: Standard Analyzer

2008-08-25 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Standard Analyzer

2008-08-25 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Standard Analyzer

2008-08-25 Thread Kalani Ruwanpathirana
Hi, I am using StandardAnalyzer when creating the Lucene index. It indexes the word "wo&rk" as it is but does not index the word "wo*rk" in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no way to index and search for words like "wo*rk", you?, etc. Thanks -- K

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Chris Hostetter
: Yes the version of lucene and java are exactly the same on the different : machines. : Infact we unjared lucene and jared it with our jar and are running from the : same nfs mounts on both the machines i didn't do an indepth code read, but a quick skim of StandardTokenizerImpl didn't turn up a

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe
Hi Prashant, What is the Unicode code point associated with the 3,4,5 character? Steve On 04/22/2008 at 4:45 PM, Prashant Malik wrote: > Yes the version of lucene and java are exactly the same on > the different > machines. > Infact we unjared lucene and jared it with our jar and are > running f

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
Yes the version of lucene and java are exactly the same on the different machines. Infact we unjared lucene and jared it with our jar and are running from the same nfs mounts on both the machines Also we have tried with lucene2.2.0 and 2.3.1. with the same result . also about the actual string u

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe
Hi Prashant, On 04/22/2008 at 2:23 PM, Prashant Malik wrote: > We have been observing the following problem while > tokenizing using lucene's StandardAnalyzer. Tokens that we get is > different on different machines. I am suspecting it has something to do > with the Locale settings on individu

Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
HI , We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual machines? For example the word 'CÃ(c)sar' is split as 'CÃ

Re: Standard Analyzer Escapes

2007-07-13 Thread Mark Miller
This is certainly the case. StandardAnalyzer has a regex matcher that looks for a possible company name involving an & or an @. The QueryParser is escaping the '&' -- all of the affects described are standard results of using the StandardAnalzyer. Any double '&&' will break text, but 'sdfdf&dfs

Re: Standard Analyzer Escapes

2007-07-13 Thread Yonik Seeley
I just tried some things fast via the Solr admin interface, and everything seems fine. I think you are probably confusing what the parser does vs what the analyzer does. Try your tests with an un-tokenized field to remove that effect. -Yonik On 7/13/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote

Standard Analyzer Escapes

2007-07-13 Thread Walt Stoneburner
In reading the documentation for escape characters, I'm having a little trouble understanding what it wants me to do for certain special cases. http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters says: "Lucene supports escaping special characters that are par

Re: custom stop word list for standard analyzer

2007-04-13 Thread Chris Hostetter
: Apologies and thanks all at the same time, everyone. No apologies neccessary, you're not the first person to be confused by this, which is why I asked if you had any ideas on how we can improve hte docs -- people who know the APIs inside and out aren't in the best position to understand how to

Re: custom stop word list for standard analyzer

2007-04-13 Thread Michael Barbarelli
Apologies and thanks all at the same time, everyone. Mike On 4/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Michael Barbarelli wrote: : > Can I instantiate a standard analyzer with an argument containing my own : > stop words? If so, how? Will they be appended to or

Re: custom stop word list for standard analyzer

2007-04-12 Thread Chris Hostetter
: Michael Barbarelli wrote: : > Can I instantiate a standard analyzer with an argument containing my own : > stop words? If so, how? Will they be appended to or override the built-in I'm relly suprised how often this question gets asked ... Michael (or anyone else for that matter)

Re: custom stop word list for standard analyzer

2007-04-12 Thread Paul Cowan
Michael Barbarelli wrote: Can I instantiate a standard analyzer with an argument containing my own stop words? If so, how? Will they be appended to or override the built-in stop words? You can do it with one of the alternate constructors, and they'll override the build-in

custom stop word list for standard analyzer

2007-04-12 Thread Michael Barbarelli
I know this is a relatively fundamental thing to arrange, but I'm having trouble. Can I instantiate a standard analyzer with an argument containing my own stop words? If so, how? Will they be appended to or override the built-in stop words? Or, do I have to modify the analyzer class i

Re: Modifying the standard analyzer

2006-07-07 Thread Mark Miller
Thank you so much. I apologize for my ignorance. Mark On 7/7/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : > But ParseException extends IOException, so I don't see a problem there. : I wish my compiler agreed with you:) Which it seems to do until you : rebuild the files with javacc. I saw

Re: Modifying the standard analyzer

2006-07-07 Thread Chris Hostetter
: > But ParseException extends IOException, so I don't see a problem there. : I wish my compiler agreed with you:) Which it seems to do until you : rebuild the files with javacc. I saw at least two other posts about this : problem on the web with no answer given... : This guy also found the same

Re: Modifying the standard analyzer

2006-07-07 Thread Mark Miller
Daniel Naber wrote: On Freitag 07 Juli 2006 16:20, Mark Miller wrote: the javacc generated StandardTokenizer next() method is declared to throw a ParseException final public org.apache.lucene.analysis.Token next() throws ParseException, IOException { unfortunately, org.apache.lucene.anal

Re: Modifying the standard analyzer

2006-07-07 Thread Daniel Naber
On Freitag 07 Juli 2006 16:20, Mark Miller wrote: > the javacc generated StandardTokenizer next() method is declared to > throw a ParseException > >   final public org.apache.lucene.analysis.Token next() throws > ParseException, IOException { > > unfortunately, org.apache.lucene.analysis.Token nex

Modifying the standard analyzer

2006-07-07 Thread Mark Miller
I have added support for sent/para prox search by modifying the notspan query. In doing so I have changed the standard analyzer javacc .jj file. Here is my problem: the javacc generated StandardTokenizer next() method is declared to throw a ParseException final public

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
ECTED] Sent: Monday, September 26, 2005 3:07 PM To: java-user@lucene.apache.org Subject: RE: Problems in standard Analyzer The problem is that in limo you can only use standard analyzers for your queries. As you've already seen some of them will change the key value to something else or even r

RE: Problems in standard Analyzer

2005-09-26 Thread Kunemann Frank
nge it for your needs (e.g. add an option for no analyzer). Frank -Original Message- From: "M å n i s h " [mailto:[EMAIL PROTECTED] Sent: Monday, September 26, 2005 9:42 AM To: java-user@lucene.apache.org Subject: RE: Problems in standard Analyzer Actually in Index I can se

Re: Problems in standard Analyzer

2005-09-26 Thread Anand Kishore
nk [mailto:[EMAIL PROTECTED] > Sent: Monday, September 26, 2005 1:07 PM > To: java-user@lucene.apache.org > Subject: RE: Problems in standard Analyzer > > It should be possible to combine queries using different types of > analyzers. > The only problem I can see is if you&#x

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
[mailto:[EMAIL PROTECTED] Sent: Monday, September 26, 2005 1:07 PM To: java-user@lucene.apache.org Subject: RE: Problems in standard Analyzer It should be possible to combine queries using different types of analyzers. The only problem I can see is if you're using one single line for the

RE: Problems in standard Analyzer

2005-09-26 Thread Kunemann Frank
:05 AM To: java-user@lucene.apache.org Subject: RE: Problems in standard Analyzer I thought of not using any Analyzer, but the problem is I got other queries that I am appending to this value with either OR or AND, so for that part of query I need Standard Analyzer , I think I should index that va

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
I thought of not using any Analyzer, but the problem is I got other queries that I am appending to this value with either OR or AND, so for that part of query I need Standard Analyzer , I think I should index that value like normal text, then may be it will work. -Original Message

RE: Problems in standard Analyzer

2005-09-25 Thread Kunemann Frank
TED] Sent: Monday, September 26, 2005 5:46 AM To: java-user@lucene.apache.org Subject: Problems in standard Analyzer Hi Mark and other Gurus, I am indexing one value as a key field (rtf & txt indexing) , value is like 12345 or 123-09-34 or it can be like MN12345. Problem is if the value is like 1

Problems in standard Analyzer

2005-09-25 Thread M å n i s h
Hi Mark and other Gurus, I am indexing one value as a key field (rtf & txt indexing) , value is like 12345 or 123-09-34 or it can be like MN12345. Problem is if the value is like 12345 or 123-23-98 , Standard Analyzer is able to search it, but if the value is like MN12345 search will not re