lucene 3.0.3 | phrase query problem

2011-02-10 Thread Ranjit Kumar
Hi Anshum, Thanks for your replay.. Yes, I am agree with you. As right now, I am using StandardAnalyzer it remove stop words, Puts text in lowercase and do not create index for most common word in English. Searching on index created by StandardAnalyzer it gives result as discussed

RE: Question about Case Sensitive?!

2011-02-10 Thread Uwe Schindler
Hi Gong Li, You can create your own Analyzer that does not add LowerCaseFilter to the FilterChain. To achieve this, use the source code from Lucene source, rename the class (e.g. org.yourpackage.NoLowercasingStandardAnalyzer) and remove LowerCaseFilter from the tokenStream() and reuseableTokenStre

escaping of queries with solr

2011-02-10 Thread Todd Nine
Hi guys, We're migrating from Lucene to Solr. We have a lot of existing code that created queries in memory with Lucene. Below is an example of such a query. BooleanQuery query = new BooleanQuery(); BooleanQuery inputTerms = new BooleanQuery();

Running a string through a simple Tokenizer, and then additional Tokenizers (vs. TokenFilters)

2011-02-10 Thread Tavi Nathanson
Hey everyone, I'm trying to do the following: 1. Run a string through a simple tokenizer (i.e. WhitespaceTokenizer) 2. Run the resultant tokens through my current tokenizer as well as StandardTokenizer, in order to isolate the tokens that are different between them. (Background: I want to do

Question about Case Sensitive?!

2011-02-10 Thread Gong Li
Hi, I use standardAnalyzer, queryParser, highlighter in my program, but they lowercase the keywords. Now i need to search the keywords CASE SENSITIVE. Is there any methods to achieve this and also use standardAnalyzer and queryParser? Or some other ways? HOW??? Thx.

Re: lucene 3.0.3 | phrase query problem

2011-02-10 Thread Anshum
Hi Ranjit, That would be because all stop words (space, comma, stop word set, etc..) would be treated in a similar fashion and escaped while indexing, subject to the analyzer you use while index your content. Hope that explains the issue. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Feb 1

lucene 3.0.3 | phrase query problem

2011-02-10 Thread Ranjit Kumar
searchString = "i am using sql. server setting is easy task."; while i am searching for phrase query "Sql Server" in above string it gives result which is not correct. As In the above string sql and server is seperated by dot(.) using both PhraseQuery and SpanQuery gives same result. Hi,

Re: How to implement a proximity search using LINES as slop

2011-02-10 Thread Doron Cohen
IIUC what you are trying to achieve I think the following could help, without setting all words in a line to be in the same position: At indexing, set a position increment of N (e.g. 100) at line start tokens. This would set a position gap of N between last token of line x to first token of line x+

Re: index size doubling / optimization (Lucene 3.0.3)

2011-02-10 Thread Michael McCandless
IndexWriter.setInfoStream -- when you set that, it produces lots of verbose output detailing what IW is doing to the index... Mike On Wed, Feb 9, 2011 at 8:06 PM, Phil Herold wrote: > I didn't have any errors or exceptions. Sorry to be dense, but what exactly > is the "infoStream output" you're

Re: HighFreqTerms patch

2011-02-10 Thread Pablo Mendes
You're right. My bad! I was looking at 2.9.3. :( I guess I owe somebody a beer. :) On Thu, Feb 10, 2011 at 12:16 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Sorry, I mean "let you specify numTerms". > > Mike > > On Wed, Feb 9, 2011 at 6:16 PM, Michael McCandless > wrote: > > Hmm

Re: [Lucene] custom Query, and Stop Words

2011-02-10 Thread sol myr
Thanks so much - I used STOP_WORDS_SET and it works fine (luckily, punctuation and case are not a problem in our case). Thanks ! --- On Wed, 2/9/11, Ian Lea wrote: From: Ian Lea Subject: Re: [Lucene] custom Query, and Stop Words To: java-user@lucene.apache.org Date: Wednesday, February 9, 2011