date:20090402

Re: Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-02 Thread Michael McCandless

Actually you have to mark the field as Field.Store.YES in order to see that field when you retrieve the doc at search time. You'll then be able to retrieve the string value. Mike On Thu, Apr 2, 2009 at 10:45 AM, David Seltzer wrote: > Hi All, > I have a document with a field called "TextTranscr

Re: IndexWriter.deleteDocuments(Query query)

2009-04-02 Thread Michael McCandless

On Thu, Apr 2, 2009 at 2:26 PM, John Wang wrote: > Hi Michael: > Thanks for looking into this. > > Approach 2 has a dependency on how fast the delete set performs a check > on a given id, approach one doesn't. After replacing my delete set with a > simple bitset, approach 2 gets a 25-30% imp

Re: Speed of fuzzy searches

2009-04-02 Thread Matt Schraeder

>>> erickerick...@gmail.com 4/2/2009 10:24:42 AM >>> >This seems really odd, especially with an index that size. The >first question is usually "Do you open an IndexReader for >each query?" I'm using the Zend_Search_Lucene implementation so I'm really not sure how it handles the IndexReader.

Re: IndexWriter.deleteDocuments(Query query)

2009-04-02 Thread John Wang

Hi Michael: Thanks for looking into this. Approach 2 has a dependency on how fast the delete set performs a check on a given id, approach one doesn't. After replacing my delete set with a simple bitset, approach 2 gets a 25-30% improvement. I understand if the delete set is small, appr

Re: LuSQL download link error?

2009-04-02 Thread Glen Newton

Dear Shashi, It should work now. A temporary failure: our apologies. thanks, Glen 2009/4/2 Shashi Kant : > Hi all, I have been trying to get the latest version of LuSQL from the > NRC.ca website but get 404s on the download links. I have written to the > webmaster, but anyone have the jar handy

LuSQL download link error?

2009-04-02 Thread Shashi Kant

Hi all, I have been trying to get the latest version of LuSQL from the NRC.ca website but get 404s on the download links. I have written to the webmaster, but anyone have the jar handy? Could I download from somewhere else? or could you email it to me? thanks, Shashi

Re: Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread Paul Elschot

On Thursday 02 April 2009 15:36:44 David Seltzer wrote: > Hi all, > > > > I'm trying to figure out how to use SpanNearQuery.getSpans(IndexReader) > when working with a result set from a query. > > > > Maybe I have a fundamental misunderstanding of what an IndexReader is - > I'm under the i

Re: Speed of fuzzy searches

2009-04-02 Thread mark harwood

Try setting the minimum prefix length for fuzzy queries ( I think there is a setting on QueryParser or you may need to subclass) Prefix length of zero does edit distance comparisons for all unique terms e.g. from "aardvark" to "" Prefix length of one would cut this search space down to just

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Lebiram

I think I have looked at constant score queries however, the relevance is of value to the users so we left it as is. :( Erick's idea of stripping terms with wildcards that has less then an acceptable number of characters is a good idea and I might try it once I get the time. Thanks, M ___

Re: Speed of fuzzy searches

2009-04-02 Thread Mark Miller

Matt Schraeder wrote: I've got a simple Lucene index and search built for testing purposes. So far everything seems great. Most searches take 0.02 seconds or less. Searches with 4-5 terms take 0.25 seconds or less. However, once I began playing with fuzzy searches everything seemed to really sl

Re: Speed of fuzzy searches

2009-04-02 Thread Erick Erickson

This seems really odd, especially with an index that size. The first question is usually "Do you open an IndexReader for each query?" If you do, be aware that opening a reader/searcher is expensive, and the first few queries through the system are slow as the caches are built up. The second questi

Speed of fuzzy searches

2009-04-02 Thread Matt Schraeder

I've got a simple Lucene index and search built for testing purposes. So far everything seems great. Most searches take 0.02 seconds or less. Searches with 4-5 terms take 0.25 seconds or less. However, once I began playing with fuzzy searches everything seemed to really slow down. A fuzzy search

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Mark Miller

You might try a constant score wildcard query (similar to a filter) - I think you'd have to grab it from solr's codebase until 2.9 comes out though. No clause limit, and reportedly *much* faster on large indexes. -- - Mark http://www.lucidimagination.com Lebiram wrote: Hi Erick The query

Re: Help to determine why an optimized index is proportionaly too big.

2009-04-02 Thread Michael McCandless

On Wed, Apr 1, 2009 at 5:20 PM, Dan OConnor wrote: > All: > > We are using java lucene 2.3.2 to index a fairly large number of documents > (roughly 400,000 per day). We have divided the time history into various > depths. > > Our first stage covers 8 days and our next stage covers 22. The index

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Erick Erickson

I didn't code it, so I'm speaking at least second hand It's a valid question whether having larger clauses is useful to the user. Having a 1024 term OR clause isn't narrowing that much. Plus, I think, it was a number that says, in effect, "you should know that this is getting to be an expensiv

Retrieving TokenStream from Tokenized Non-Stored Field

2009-04-02 Thread David Seltzer

Hi All, I have a document with a field called "TextTranscript". Its created using the following command: myDoc.add(new Field("TextTranscript", sTranscriptBody, Field.Store.NO, Field.Index.TOKENIZED)); I'm then trying to retrieve the TokenStream by pulling the field. Field fTextTranscript = lucDo

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Lebiram

Hi Erick The query was a test data basically in anticipation of searches on all indices (4 index) with 12 million docs that should yield very small results. Obviously that query does not happen in real life but it did break the system. If some user thought of just inputting random words then the

Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread David Seltzer

Hi all, I'm trying to figure out how to use SpanNearQuery.getSpans(IndexReader) when working with a result set from a query. Maybe I have a fundamental misunderstanding of what an IndexReader is - I'm under the impression that it's a mechanism for sequentially accessing the documents in an

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Erick Erickson

Ah, I get it now. Given that you bumped your max clause up, it makes sense. I'm pretty sure that the wildcard expansion is the root or your memory problems. The folks on the list helped me out a lot understanding what wildcards were about, see the thread titled "I just don't get wildcards at all" i

Re: What is the right query syntax for matching some field's substring?

2009-04-02 Thread Seid Mohammed

hi bonn, can you give me the link you did read for substring matching Thanks a lot On 4/2/09, Bon wrote: > > Hi Matt, > > Thanks for your answer, > I'm new to lucene, so I don't know what should I know about that. > I find a reference about discuss searching substring and it work goo

Re: Lock obtain timed out

2009-04-02 Thread Ian Lea

Hi >From the 2.4 javadocs for IndexWriter: setDefaultWriteLockTimeout(long writeLockTimeout) Sets the default (for any instance of IndexWriter) maximum time to wait for a write lock (in milliseconds). Lucene waits for the max specified time, retrying every 1000 millisecs by default, then g

SpellChecker AlreadyClosedException issue

2009-04-02 Thread John Cherouvim

Hello My code looks like this: Directory dir = null; try { dir = FSDirectory.getDirectory("/path/to/dictionary"); SpellChecker spell = new SpellChecker(dir); // exception thrown here // ... dir.close(); } catch (IOException ex) { log error } finally { if (dir!=null) { tr

Lock obtain timed out

2009-04-02 Thread Rehan Abdulaziz

Hey, Lucene is deployed at my Tomcat server, and when I send parallel calls from my client to add, delete or update documents, some operations are unsuccessful. The following exception is thrown: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: simplefsl...@d:\testIndex\wr

Re: IndexWriter.deleteDocuments(Query query)

2009-04-02 Thread Michael McCandless

On Wed, Apr 1, 2009 at 6:37 PM, John Wang wrote: > a code snippet is worth 1000 words :) Here here! OK, now I understand the difference. With approach 1, for each of N UIDs you use a TermDocs to find the postings for that UID, and retrieve the one docID corresponding to that UID. You retrieve

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

2009-04-02 Thread Lebiram

Hi Erick, I did a search just as JVM started... so I'm thinking that the JVM is busy with some startup stuff... and that this search required more memory than what is available at that time. Had I done this search a while after the JVM has started, then this query succeeds. I then pump in se

Re: Retrieving TokenStream from Tokenized Non-Stored Field

Re: IndexWriter.deleteDocuments(Query query)

Re: Speed of fuzzy searches

Re: IndexWriter.deleteDocuments(Query query)

Re: LuSQL download link error?

LuSQL download link error?

Re: Using SpanNearQuery.getSpans() in a Search Result

Re: Speed of fuzzy searches

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Re: Speed of fuzzy searches

Re: Speed of fuzzy searches

Speed of fuzzy searches

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Re: Help to determine why an optimized index is proportionaly too big.

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Retrieving TokenStream from Tokenized Non-Stored Field

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Using SpanNearQuery.getSpans() in a Search Result

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Re: What is the right query syntax for matching some field's substring?

Re: Lock obtain timed out

SpellChecker AlreadyClosedException issue

Lock obtain timed out

Re: IndexWriter.deleteDocuments(Query query)

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

25 matches

Site Navigation

Mail list logo

Footer information