RE: lucene 3.0.3 | phrase query problem

2011-02-09 Thread Zhang, Lisheng
Hi, I think using Field.Index.NOT_ANALYZED means ignoring StandardAnalyzer, so we index "sql. server" as one word. You may use luke to see how this field is indexed. In this case we can only search whole term (without case change even), if using the StandardAnalyzer to analyze "sql. server" w

lucene 3.0.3 | phrase query problem

2011-02-09 Thread Ranjit Kumar
Hi, I am using SpanQuery and SpanNearQuery to get phrase query like "Sql Server". In my text file in which I am searching, it is present like (sql. server) mean 'sql dot server' which is not a span like "Sql Server". While searching for phrase query "Sql Server". It gives result for (sql. ser

Re: index size doubling / optimization (Lucene 3.0.3)

2011-02-09 Thread Phil Herold
I didn't have any errors or exceptions. Sorry to be dense, but what exactly is the "infoStream output" you're asking about? >This is not expected. > >Did the last IW exit "gracefully"? If so, it should delete the old >segments after swapping in the optimized one. >Can you post infoStre

Re: HighFreqTerms patch

2011-02-09 Thread Michael McCandless
Sorry, I mean "let you specify numTerms". Mike On Wed, Feb 9, 2011 at 6:16 PM, Michael McCandless wrote: > Hmm, which version of Lucene are you using?  Newer versions let you > specify a field... > > Mike > > On Wed, Feb 9, 2011 at 12:06 PM, Pablo Mendes wrote: >> Guys, >> this is tiny and prob

Re: HighFreqTerms patch

2011-02-09 Thread Michael McCandless
Hmm, which version of Lucene are you using? Newer versions let you specify a field... Mike On Wed, Feb 9, 2011 at 12:06 PM, Pablo Mendes wrote: > Guys, > this is tiny and probably not relevant. But I'll bet a beer that at least a > dozen people had to dirtymod this class while they could have r

Re: index size doubling / optimization (Lucene 3.0.3)

2011-02-09 Thread Michael McCandless
This is not expected. Did the last IW exit "gracefully"? If so, it should delete the old segments after swapping in the optimized one. Can you post infoStream output after running optimize? Mike On Wed, Feb 9, 2011 at 1:58 PM, Phil Herold wrote: > I know that the size of a Lucene index can do

index size doubling / optimization (Lucene 3.0.3)

2011-02-09 Thread Phil Herold
I know that the size of a Lucene index can double while optimization is underway, but it's supposed to eventually settle back down to the original size, correct? We have a Lucene index consisting of 100K documents, that is normally about 12GB in size. It is split across 10 sub-indexes which we sear

HighFreqTerms patch

2011-02-09 Thread Pablo Mendes
Guys, this is tiny and probably not relevant. But I'll bet a beer that at least a dozen people had to dirtymod this class while they could have run it from command line. A 15 min time save that took 15 min to create. I guess it's a tie. Best, Pablo --- HighFreqTerms.java +++ ExtractStopwords.java

Re: HA Configuration / Best Practices

2011-02-09 Thread Ian Lea
One way, not necessarily typical or best practice, but known to work, is to designate one of your WS layer machines, or another server, as the master indexer. Run all index updates on that server and copy indexes out to other server(s) using rsync. That is normally quick since it only takes chang

Re: Lucene Questions about query and highlighter~^^

2011-02-09 Thread Ian Lea
http://lucene.apache.org/java/3_0_3/queryparsersyntax.html should have answers to your query formulation questions. You might also like to consider building the queries yourself programatically and using Span queries. They are good for specifying slop and order and so on. There is useful info on

Re: [Lucene] custom Query, and Stop Words

2011-02-09 Thread Ian Lea
Have you considered using stemming instead? Sounds like that might make most of your problems go away and achieve the same result. I'm not aware of a utility method to remove stop words from a string but there are ways of passing data through analyzers/tokenizers and grabbing the output. Standard

[Lucene] custom Query, and Stop Words

2011-02-09 Thread sol myr
Hi, I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users: If a user types:  java program I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER"). So naively I split the

RE: How to implement a proximity search using LINES as slop

2011-02-09 Thread Pierre GOSSE
I've just read about payload, and I'm not sure it would be easy to use that feature in calculating distances for SpanQueries. I guess You would have to build your own SpanQuery, using payload instead of term position. But it would be much lighter for index size if it were possible, indeed. I ho