deleteDocuments by Term[] for ALL terms

2007-11-25 Thread Antony Bowesman
Hi, I'm using IndexReader.deleteDocuments(Term) to delete documents in batches. I need the deleted count, so I cannot use IndexWriter.deleteDocuments(). What I want to do is delete documents based on more than one term, but not like IndexWriter.deleteDocuments(Term[]) which deletes all docum

Re: How to delete old index

2007-11-25 Thread Cool Coder
I tried with your suggestion but still it did not delete old index files. Anyway I closed reader before closing writer and reopened reader after closing writer which deleted all old index files. reader.close(); writer.close(); reader.open(); - RB Michael McCandless <[EMAIL PROTEC

Re: How to delete old index

2007-11-25 Thread Michael McCandless
"Cool Coder" <[EMAIL PROTECTED]> wrote: > Yes. Because I cannot close IndexReader in the live system. And also I > am running on Windows server. > At the end of index writing, I close writer and also reopen reader OK, I'm glad we have it explained! > writer.close() > reader.close(); >

Re: How to delete old index

2007-11-25 Thread Cool Coder
>Ahh, OK. Are you leaving your old IndexReader open against the index while >your >new IndexWriter is creating the new index? Are you running on Windows? Yes. Because I cannot close IndexReader in the live system. And also I am running on Windows server. At the end of index writing, I

Re: Index: mixing the structure of persistence

2007-11-25 Thread Erick Erickson
As I understand, Lucene does a fair amount of caching of terms in memory without you having to specify anything. But it's hard to see how your question relates. Remember that Lucene is finding *all* matching docs. So searching in a RAMdirectory and then searching in the file doesn't really seem po

Re: How to delete old index

2007-11-25 Thread Michael McCandless
Ahh, OK. Are you leaving your old IndexReader open against the index while your new IndexWriter is creating the new index? Are you running on Windows? If so, then this behavior makes sense: the old IndexReader will prevent deletion of all files it is using (which is all files in the index whe

Re: How to delete old index

2007-11-25 Thread Cool Coder
>What do you mean by "you can see two index"? I can see two sets of lucene index files with same size and time stamp difference is 4 hrs. E.g. At start up , the lucence generated index file _8w.cfs (also some more files) with size 4 MB and time stamp November 24, 2007, 2:57:48 PM

Re: LIA example problem

2007-11-25 Thread Grant Ingersoll
LIA is based on 1.4.3 of Lucene. The Field.Keyword, etc. methods have been removed in place of just using constructors. -Grant On Nov 25, 2007, at 10:41 AM, Liaqat Ali wrote: Hello I m studying Lucene In Action. In chapter 2 the first example in generating errors in this part of code.

LIA example problem

2007-11-25 Thread Liaqat Ali
Hello I m studying Lucene In Action. In chapter 2 the first example in generating errors in this part of code. doc.add(Field.Keyword("id", keywords[i])); doc.add(Field.UnIndexed("country", unindexed[i])); doc.add(Field.UnStored("contents", unstored[i])); doc.add(Field.Text("cit

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Mathieu Lecarme
Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind of "wildcard fuzzy" but not real fuzzy anymore. I understand the optimitation but right now I hardly can image a reasonable use-case. Who care whether the levenstein distance is a the beginnen, middle

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread markharw00d
For "fuzzy" you're going to pay one way or another. You can use ngram analyzers on indexed content and queries which will add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in both your query and index) or you can use some form of query-time edit distance comparison on "files" a

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig
On Saturday 24 November 2007 18:28:48 markharw00d wrote: > term. You can limit the number of edit distance comparisons conducted by > setting the minimum prefix length. This is a property of the QueryParser Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind

Re: How to delete old index

2007-11-25 Thread Michael McCandless
"Cool Coder" <[EMAIL PROTECTED]> wrote: > Hi, > I used have index refreshed in every 4 hr. However after each > refresh, I can see two index. I am not sure how can I delete old > index. What do you mean by "you can see two index"? > On starting of indexing process, I creat

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig
On Saturday 24 November 2007 18:48:18 Mathieu Lecarme wrote: > fuzzy are simply not indexed. > If you wont to search quickly with fuzzy search, you should index word > and their ngrams, it's the "do you mean" pattern. replacing fuzzy with "did you mean" is indeed my favourite option however so fa

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig
On Saturday 24 November 2007 18:28:48 markharw00d wrote: > The added IO is one factor. Another is the CPU load from doing many > edit-distance comparisons between index terms and the provided search You mean FuzzyQuery.rewrite(). Are you sure this is a CPU and not an IO issue (reading the terms f