IndexWriter.ramSizeInBytes() no longer returns to 0 after commit()?

2011-08-22 Thread Trejkaz
Hi all. We are using IndexWriter with no limits set and managing the commits ourselves, mainly so that we can ensure they are done at the same time as other (non-Lucene) commits. After upgrading from 3.0 ~ 3.3, we are seeing a change in ramSizeInBytes() behaviour where it is no longer resetting t

Searching behaviour with content containing decimal points

2011-08-22 Thread SBS
I have content such as "E71.0" and when I enter a search query of "E71" I would like it to match that document. At the moment though it only matches that document if I enter "E71*" or "E71.0". What's the trick to getting such a query to match this document? I am using StandardAnalyzer and QueryP

SpanNearQuery vs. PhraseQuery.setSlop

2011-08-22 Thread ikoelliker
Hello, We are using phrase queries with a slop value to perform Near and Within style searches and the issue we are encountering is as follows: Since the slop value in the PhraseQuery is the edit-distance, a message with the terms 'thank' and 'you' will be found with a query of field:"thank you

Re: SSD Experience

2011-08-22 Thread Karl Wettin
22 aug 2011 kl. 18.49 skrev Rich Cariens: > I found a Lucene SSD performance benchmark > docbut > the wiki engine is refusing to let me view the attachment (I get "You > are not allowed to d

SSD Experience

2011-08-22 Thread Rich Cariens
(Cross-posted from solr-users) Ahoy ahoy! Does anyone have any experiences or stories they can share about how SSDs impacted search performance for better or worse? I found a Lucene SSD performance benchmark doc

heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Simon Willnauer
I just reverted a previous commit related to CompoundFile in the 3.x stable branch. If you are using unreleased 3.x branch you need to reindex. See here for details: https://issues.apache.org/jira/browse/LUCENE-3218 If you are using a released version of Lucene/Solr then you can ignore this m

propagate payload byte[] until collector collect ?

2011-08-22 Thread ac
hi , What is the correct way to propagate the payload byte array at collector collecting time ? Currently we are using a ThreadLocal object inside our Similarity subclass (during scoring payload) to keep the reference of the current payload. Then we retrieve the payload byte array at collecting tim

Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)

2011-08-22 Thread govind bhardwaj
Hi Eric, Thanks for your reply. I verified Srinivas' query by changing Lucene version ( in the constructor of StandardAnalyzer ) to LUCENE_30 to find that parsed query indeed changes to xyz abc (input query was 'xyz_abc') while that does not happen in case of LUCENE_33 and the parsed query remain

Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)

2011-08-22 Thread Erick Erickson
No, that's expected. StandardAnalyzer breaks on '_' as far as I know. NOTE: the behavior changed a bit as of Solr 3.1. To get the old StandardAnalyzer behavior, I believe you need ClassicAnalyzer... More than you ever want to know about breaking lines (3.1+) http://unicode.org/reports/tr29/#Word_

Re: Tokenize a dictionary of phrases

2011-08-22 Thread Erick Erickson
Hmmm, would it work for your case to use Synonyms? If you set expand=false and in your synonyms file have: quick brown => quickbrown it might do what you want Best Erick On Sun, Aug 21, 2011 at 3:53 PM, Xiyang Chen wrote: > Hi, > > I have a dictionary of multi-word phrases and I'd like to

Re: Analysis

2011-08-22 Thread Graham Sugden
Caveat to the below is that I am very new to lucene. (That said though, following the below strategy, after a couple of days work I have a set of per field analyzers for various languages, using various custom filters, caching of initial analysis; and capable of outputting stemmed, reversed, diacri

Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)

2011-08-22 Thread govind bhardwaj
Hi Srinivas, It works for the latest Lucene Version 3.3.0 (in fact for versions after 3.0.0). Standard Analyzer just splits the text ignoring a set of STOP_WORDS like "is", "in", etc. In the class definition of StandardAnalyzer in Lucene 3.3.0 API, it is clearly stated :- "As of 3.1, StandardToke

Re: Analysis

2011-08-22 Thread Mihai Caraman
http://snowball.tartarus.org/ for stemming 2011/8/22 Saar Carmi > Hi > Where can I find a guide for building analyzers, filters and tokenizers? > > Saar >

Analysis

2011-08-22 Thread Saar Carmi
Hi Where can I find a guide for building analyzers, filters and tokenizers? Saar

Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)

2011-08-22 Thread srinu . hello
Hello All, I observed some unexpected behavior using StandardAnalyzer to parse the query. Here is the demonstration. I am passing the query as (key:xyz_abc) && (text:blabla) Expecting the parsed query to be +key:xyz_abc +text:blabla Actual Result is +key:"xyz abc" +text:blabla As p