RE: Sorting in Lucene

2006-03-13 Thread Chris Hostetter
: Btw, this is the statement the sort field is added to the document. : : doc.add(Field.UnIndexed("_s" + sortField, sortableData )); Um ... only index fields are sortable ... are you sure you are sorting on the field you think you are? It's possible that since you are trying to sort on an

RE: Sorting in Lucene

2006-03-13 Thread Bob Cheung
I'm pretty sure. The other characters sorted according to the ASCII sequence. It's only the slash sorted before the space. That's why I wonder whether slash is treated differently. Btw, this is the statement the sort field is added to the document. doc.add(Field.UnIndexed("_s" + sortFi

still question...

2006-03-13 Thread Aditya Liviandi
How do I write an index into a bytearrayoutputstream instead of a directory? --- I²R Disclaimer -- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us

Re: Can Lucene load more then 2GB into RAM memory?

2006-03-13 Thread Doug Cutting
RAMDirectory is indeed currently limited to 2GB. This would not be too hard to fix. Please file a bug report. Better yet, attach a patch. I assume you're running a 64bit JVM. If so, then MMapDirectory might also work well for you. Doug z shalev wrote: this is in continuation of a pr

question...

2006-03-13 Thread Aditya Liviandi
--- Begin Message --- Hi all,   If I want to embed the index files into another file (say of extension *.luc, so now all the index files are flattened inside this new file), can I still use the index without having to extract out the index files to a temp folder?   aditya --- End

Re: Sorting in Lucene

2006-03-13 Thread Yonik Seeley
On 3/13/06, Bob Cheung <[EMAIL PROTECTED]> wrote: > I am curious why the character "/" sorts before the space. > > For example, > > Apple/banana is good for you. > > Sorts before > > Apple banana is good for you Are you sure that the field is untokenized, and that you are sorting in the correct di

Sorting in Lucene

2006-03-13 Thread Bob Cheung
I am curious why the character "/" sorts before the space. For example, Apple/banana is good for you. Sorts before Apple banana is good for you Is there something I can do to make it sort correctly? Regards, Bob - To uns

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Chris Hostetter
: The trick is that once segment files are written, they are never : modified (except for the "segments" file itself). New documents are : added to new segments, not existing segments. When segments are : merged, a new bigger segment is created. This way, the view of the : index for a specific

Re: Keeping RAMDirectory and filesystem index in sync

2006-03-13 Thread Chris Hostetter
: The Searching process then would have to re-open it's RAMDirectory. the key to all of this being that there are constructors for RAMDirectory that make it very easy to load in the contents of an FSDirectory. : Or you check the version of the fs-based index from time to time, to see : when it h

Re: Setting the COMMIT lock timeout.

2006-03-13 Thread Daniel Naber
On Montag 13 März 2006 22:24, Bill Janssen wrote: > The default value isn't magic.  The appropriate value is > context-specific.  I've got some people using Lucene on machines with > slow disks, and we need to be able to increase the WRITE_LOCK_TIMEOUT > to prevent entirely random lossage. Here's

Re: Setting the COMMIT lock timeout.

2006-03-13 Thread Bill Janssen
Daniel Naber ponders: > Seems these have been forgotten. They can easily be added, but I still > wonder what the use case is to set these values? The default value isn't magic. The appropriate value is context-specific. I've got some people using Lucene on machines with slow disks, and we need

Re: Setting the COMMIT lock timeout.

2006-03-13 Thread Daniel Naber
On Montag 13 März 2006 15:50, Jim Bedford-roberts wrote: > I note that this can't be set from system properties anymore > (CHANGES.txt, changes in run time behaviour 7), but am unable to find > the replacement setter method promised for IndexWriter. Seems these have been forgotten. They can easil

Re: Basic question on lucene query processing

2006-03-13 Thread Yonik Seeley
On 3/13/06, Kelly Vista <[EMAIL PROTECTED]> wrote: > Just a note: strikes me that an alternative way to do things is to first > identify a set of documents that have the term in them first This is what Lucene does. Lucene is based on an inverted index, so for any given term you can quickly find th

Basic question on lucene query processing

2006-03-13 Thread Kelly Vista
Hi - I have a basic question on the way queries are processed in Lucene. I understand that Lucene uses a variation of the vector space model in terms of how it detemines document similarity. In particular, I think it computes some sort of normalized TF-IDF score for some query against the co

Looking for Lucene consultant (UK based)

2006-03-13 Thread Robert Watkins
We, John Wiley & Sons (http://www3.interscience.wiley.com/), are looking for a Lucene expert to assist with our migration from Verity to Lucene (up to six weeks work, starting this coming Monday, 20 March). The candidate must be based in the UK, preferably in or close to London, as we would like h

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Yonik Seeley
On 3/13/06, Nikhil Goel <[EMAIL PROTECTED]> wrote: > Can someone please explain how does IndexSearcher and IndexWriter works in > conjuction. The trick is that once segment files are written, they are never modified (except for the "segments" file itself). New documents are added to new segments,

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Nikhil Goel
Hi Patrick, thanks for writing back but my question is:- do we really need to write something new to achieve what I want to achieve. By going thru Lucene Tutorials, i dont think there is a need to do such a thing:- http://blog.danbartels.com/archive/2004/09/09/186.aspx Indexing and searching are

Re: IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Patrick Kimber
Hi Nikhil We are using the index accessor contribution. For more information see: http://www.nabble.com/Fwd%3A-Contribution%3A-LuceneIndexAccessor-t17416.html#a47049 This should help you to co-ordinate the IndexSearcher and IndexWriter. Patrick On 13/03/06, Nikhil Goel <[EMAIL PROTECTED]> wrote:

Re: Failure recovery

2006-03-13 Thread Yonik Seeley
On 3/13/06, Chuck Williams <[EMAIL PROTECTED]> wrote: > Is there a way to determine whether or not an index that was left locked > due to some improper system shutdown needs repair? Depends what you mean by "repair". If there was a crash during index modification, I think the index should normall

IndexSearcher and IndexWriter in conjuction

2006-03-13 Thread Nikhil Goel
Hi, Can someone please explain how does IndexSearcher and IndexWriter works in conjuction. As far as i know after reading all the posts in newgroup, it seems everything works fine if we have one IndexWriter thread and multiple IndexSearcher thread. But my doubt here is, looking at IndexSearcher cl

Re: is there a way to find duplicate documents in the index?

2006-03-13 Thread Yonik Seeley
On 3/13/06, emerson cargnin <[EMAIL PROTECTED]> wrote: > I notice some duplicated entries in my index, my just looking at it, > and I suspect there might be more than those I found out. Is there a > way to detect duplicate documents in an index? > > Emerson Cargnin If there is a field with a uniqu

RE: Failure recovery

2006-03-13 Thread Pierre Luc Dupont
Hi Chuck, I suggest to use status file to indicate your index status. I use this and it works very well. -Original Message- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: 2006-03-13 02:22 To: java-user@lucene.apache.org Subject: Failure recovery Is there a way to determine wh

RE: 100,000 indexes and what to do

2006-03-13 Thread John Powers
How does the information change in each of these customer's documents? I would think if they were very dynamic then updates to the single index would not be great for you. But if the updates were just now and then, then given the performance of lucene that the single index would be just fine.

Setting the COMMIT lock timeout.

2006-03-13 Thread Jim Bedford-roberts
I'm confused about how to set the COMMIT lock timeout since the version 1.9.1 release. I note that this can't be set from system properties anymore (CHANGES.txt, changes in run time behaviour 7), but am unable to find the replacement setter method promised for IndexWriter. Can anyone point

Re: Throughput doesn't increase when using more concurrent threads

2006-03-13 Thread Peter Keegan
Chris, My apologies - this error was apparently caused by a file format mismatch (probably line endings). Thanks, Peter On 3/13/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > Chris, > > Should this patch work against the current code base? I'm getting this > error: > > D:\lucene-1.9>patch -b -p0

Re: Throughput doesn't increase when using more concurrent threads

2006-03-13 Thread Peter Keegan
Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9>patch -b -p0 -i nio-lucene-1.9.patch patching file src/java/org/apache/lucene/index/CompoundFileReader.java patching file src/java/org/apache/lucene/index/FieldsReader.java missing header for unifie

RE: Keeping RAMDirectory and filesystem index in sync

2006-03-13 Thread Satuluri, Venu_Madhav
Thanks, Jens. Seems like this would be pretty complicated. It seems the best way would be not have a separate daemon for indexing modifiied documents, but just have the reindexing part in the backend itself (it would know when any documents were modifiied), but since it would involve some code

Re: Keeping RAMDirectory and filesystem index in sync

2006-03-13 Thread Jens Kraemer
On Mon, Mar 13, 2006 at 06:23:10PM +0530, Satuluri, Venu_Madhav wrote: > Hi, > > Is there an elegant way to keep RAMDirectory and my file-system based > index in sync? I have a java class that is periodically started up by > crond that checks for modified documents and then reindexes them onto > t

Keeping RAMDirectory and filesystem index in sync

2006-03-13 Thread Satuluri, Venu_Madhav
Hi, Is there an elegant way to keep RAMDirectory and my file-system based index in sync? I have a java class that is periodically started up by crond that checks for modified documents and then reindexes them onto the filesystem. However, for searching I want to use RAMDirectory (for the performan

RE: Search for synonyms - implemenetation for review

2006-03-13 Thread Ziv Gome
Hi Mark, thanks for your response. Here are my thoughts on your suggestion: I believe it would be a good idea to merge similar query expansion code. I also agree that the situation of fuzzy query is similar to the synonym query use-case, in the sense of having a root term and some related, de-boo

is there a way to find duplicate documents in the index?

2006-03-13 Thread emerson cargnin
I notice some duplicated entries in my index, my just looking at it, and I suspect there might be more than those I found out. Is there a way to detect duplicate documents in an index? Emerson Cargnin - To unsubscribe, e-mail: [E

Failure recovery

2006-03-13 Thread Chuck Williams
Is there a way to determine whether or not an index that was left locked due to some improper system shutdown needs repair? My code does the following as part of starting up and creating an IndexWriter for an existing index that was created in a prior session: > if (IndexReader.isLocked(i