RE: Different replicas return different scores

2010-02-09 Thread Yuval Feinstein
Thanks for these directions, Ian. We are running Lucene 2.9.1 on CentOs 5 64-bit machines. We do use compound file format, and will look into replacing it with the simple files, although I believe this will create too many files. We will also consider the rsync option. Thanks again, -- Yuval ---

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Marvin Humphrey
On Tue, Feb 09, 2010 at 03:47:19PM -0500, Michael McCandless wrote: > Interesting... and segment merging just does its own private > concatenation/mapping-around-deletes of the doc/positions? I think the answer is yes, but I'm not sure I understand the question completely since I'm not sure why y

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 1:12 PM, Marvin Humphrey wrote: > On Tue, Feb 09, 2010 at 11:51:31AM -0500, Michael McCandless wrote: > >> You should (when possible/reasonable) instead use >> ReaderUtil.gatherSubReaders, then iterate through those sub readers >> asking each for its flex fields. > >> But if

Re: Synonym map

2010-02-09 Thread Simon Willnauer
Maybe I miss something but what is wrong with SynonymTokenFilter in contrib/wordnet? simon On Tue, Feb 9, 2010 at 5:03 PM, Ian Lea wrote: > Lucene in Action second edition has Synonym stuff that I think will > work with lucene 3.0. > > Source code available from http://www.manning.com/hatcher3/

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Marvin Humphrey
On Tue, Feb 09, 2010 at 11:51:31AM -0500, Michael McCandless wrote: > You should (when possible/reasonable) instead use > ReaderUtil.gatherSubReaders, then iterate through those sub readers > asking each for its flex fields. > > But if this is only for testing purposes, and Multi*Enum is more > c

Re: fast Result Count

2010-02-09 Thread Ian Lea
Write a simple Collector (read the javadocs) that has a collect(int doc) method that does nothing except increment a counter. Use it via one of the search methods that takes a Collector. btw TopDocCollector won't load them all in memory, but obviously it will keep track of the top scoring docs.

Re: fast Result Count

2010-02-09 Thread Erick Erickson
I'm not sure what you mean by "loading them all into memory". I'm pretty sure that the numHits you specify just limits the number of documents kept in the internal ScoreDocs, and getTotalHits can easily be much greater than numHits. But that would be trivial to test (you shouldn't take my word for

Re: Creating a Query that matches Documents without a specific Field set?

2010-02-09 Thread Ahmet Arslan
> is there any way I can search for Documents that have a > specific Field not set? Yes. If you are using QueryParser *:* -specificField:[* TO *] > I was hoping that a simple TermQuery where the term value > was set to be an empty String would help me out but I was prooven > wrong. org.apache

Creating a Query that matches Documents without a specific Field set?

2010-02-09 Thread Benjamin Pasero
Hi, is there any way I can search for Documents that have a specific Field not set? The use case is obvious: Consider you introduce a new field to your documents but dont want to migrate all other documents, how would you be able to write a Query that covers both old and new documents? I was hop

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 11:35 AM, Renaud Delbru wrote: >> This particular patch doesn't change the Codecs API -- it "only" >> factors out the Multi* APIs from MultiReader.  Likely you won't need >> to change your codec... but try applying the patch and see :) >> > > Ok, good news ;o). Flex is sti

fast Result Count

2010-02-09 Thread Klaus Teller
Hi Guys, Is there a way to speed up couting documents that satisfy a search query other than by using TopDocCollector.getTotalHits()? For instance, if there are 100 documents satisfying my search query, how can I count them without loading them all in memory? Thanks, Klaus. -- Jetzt kost

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
On 09/02/10 16:04, Michael McCandless wrote: On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote: So, does it mean that the codec interface is likely to change ? Do I need to be prepared to change again all my code ;o) ? This particular patch doesn't change the Codecs API -- it "only

Re: Different replicas return different scores

2010-02-09 Thread Ian Lea
Since the update commands may run in different order on different shards you might get different sets of segments because merges happen to be triggered at different points in the different batches of updates. But you shouldn't have different numbers of deleted docs if you have really been applying

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote: > Hi Michael, > > On 09/02/10 13:35, Michael McCandless wrote: >> >> It's great that you're testing the flex APIs... things are still "in >> flux" as you've seen.  There's another big patch pending on >> LUCENE-2111... >> > > So, does it mean th

Re: Synonym map

2010-02-09 Thread Ian Lea
Lucene in Action second edition has Synonym stuff that I think will work with lucene 3.0. Source code available from http://www.manning.com/hatcher3/ -- Ian. On Tue, Feb 9, 2010 at 2:03 PM, Marc Schwarz wrote: > Hi, > > i try to implement synonyma, but i didn't exactly know how to do it > (lu

Different replicas return different scores

2010-02-09 Thread Yuval Feinstein
We are running a large sharded Lucene-based application. Our configuration supports near real-time updates, by incrementally Updating documents (using delete then add) on the shards. Every shard is replicated to several machines in order to improve performance. We replicate the shard by sending the

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
Hi Michael, On 09/02/10 13:35, Michael McCandless wrote: It's great that you're testing the flex APIs... things are still "in flux" as you've seen. There's another big patch pending on LUCENE-2111... So, does it mean that the codec interface is likely to change ? Do I need to be prepared t

Synonym map

2010-02-09 Thread Marc Schwarz
Hi, i try to implement synonyma, but i didn't exactly know how to do it (lucene 3.0). Is anybody out there who has some small code snippets or a good link ? Thanks & Greetings, Marc - To unsubscribe, e-mail: java-user-unsub

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Michael McCandless
Renaud, It's great that you're testing the flex APIs... things are still "in flux" as you've seen. There's another big patch pending on LUCENE-2111... Out of curiosity... in what circumstances do you see a Multi*Enum appearing? Lucene's core always searches "by segment". Are you doing somethin

RE: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Uwe Schindler
Hi Renaud, > On 09/02/10 12:16, Uwe Schindler wrote: > > In flex the correct way to add additional posting data to these > classes would be the usage of custom attributes, registered in the > attributes() AttributeSource. > > > Ok, I have changed my codes to use the AttributeSource interface. >

Re: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
Hi Uwe, On 09/02/10 12:16, Uwe Schindler wrote: In flex the correct way to add additional posting data to these classes would be the usage of custom attributes, registered in the attributes() AttributeSource. Ok, I have changed my codes to use the AttributeSource interface. Due to some l

RE: Flex & Docs/AndPositionsEnum

2010-02-09 Thread Uwe Schindler
Hi Renaud, In flex the correct way to add additional posting data to these classes would be the usage of custom attributes, registered in the attributes() AttributeSource. Due to some limitations, there is currently no working support in MultiReaders to have a "view" on the underlying Enums, b

Flex & Docs/AndPositionsEnum

2010-02-09 Thread Renaud Delbru
Hi Michael, I have updated my lucene-1458, and I discovered there was big modifications in the StandardCodec interface. I updated my own codecs to this new interface, but I encounter a problem. My codecs are creating DocsAndPositionsEnum subclasses that allow to access more information than si

Re: Lucene fields not analyzed

2010-02-09 Thread Rohit Banga
moreover, search for Mr. Arun Kumar also matches other names because Mr. matches. i am ready to use Mr. as a stop word in an analyzer. Rohit Banga On Tue, Feb 9, 2010 at 2:42 PM, Rohit Banga wrote: > i'll try using Luke. > > how i want to use Lucene? > > there is a sentence that may contain th

Re: Lucene fields not analyzed

2010-02-09 Thread Rohit Banga
i'll try using Luke. how i want to use Lucene? there is a sentence that may contain the names of some people from among those in a list. the names may be incomplete or may have spelling mistakes. so i created a lucene index, with each person as a document. eg. Mr. Arun Kumar with a hit highli

RE: Lucene fields not analyzed

2010-02-09 Thread Uwe Schindler
If you don't get it working that way, then you have to ask you the question: Why do you want it indexed that way? Is it because you don't want to find all people in that field when you add ony "Mr." to a search query? It looks like you use StandardAnalyzer, and in this case, I would add "mr", no

Re: Lucene fields not analyzed

2010-02-09 Thread Mark Harwood
Use Luke. It can show you the index contents and your parsed query and should show what is breaking down here. On 9 Feb 2010, at 08:03, Rohit Banga wrote: > let us assume this is the only field that is relevant (others are stored and > not indexed). > i tried termquery and it does not work. > i

Re: Lucene fields not analyzed

2010-02-09 Thread Rohit Banga
let us assume this is the only field that is relevant (others are stored and not indexed). i tried termquery and it does not work. i also tried keyword analyzer and still could not make it work. @Mark i cannot escape the spaces in my query as i am using Lucene to identify occurences of names among