Re: IllegalStateEx thrown when calling close

2008-10-30 Thread Jed Wesley-Smith
ahh, yes, sorry, the ability to read is occasionally handy... [wipes egg off forehead] cheers, jed. Michael McCandless wrote: Actually, yes in 2.3.2: IndexReader.unlock has existed for a long time. In 2.4.0, we moved this to IndexWriter.unlock. Mike Jed Wesley-Smith wrote: not in 2.3.2 t

Re: IllegalStateEx thrown when calling close

2008-10-30 Thread Jed Wesley-Smith
Thanks Mike! Michael McCandless wrote: OK I'll add that (what IW does on setting an OOME) to the javadocs. Mike Jed Wesley-Smith wrote: Mike, regarding this paragraph: "To workaround this, on catching an OOME on any of IndexWriter's methods, you should 1) forcibly remove the write lock (I

Re: Read all the data from an index

2008-10-30 Thread Erick Erickson
I'm not sure what *could* be easier than looping with IndexSearcher.doc(), looping from 1 to maxDoc. Of course you'll have to pay some attention to whether you get a document back or not, and I'm not quite sure whether you'd have to worry about getting deleted documents. But I don't think either of

Re: Lucene Payload

2008-10-30 Thread Anshul jain
I want to give more weight to some terms in the document. Like title of the book should be given more weight than the contents. And we are testing over a wide varieties of lucene queries, with quotes, w/o quotes, phrase, span etc. As our system will be expecting more number of queries that contain

RE: Read all the data from an index

2008-10-30 Thread Dragon Fly
I'll double check but I believe all the fields in my index are stored. Should I just loop using indexSearcher.doc() or is there a faster way? Thanks. > Date: Thu, 30 Oct 2008 16:09:47 -0400 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Read all the data from an index

Re: Read all the data from an index

2008-10-30 Thread Erick Erickson
Well, that's trickier than you might think. You can easily get all the STORED data just by getting doc IDs 1-MaxDoc(). But reconstructing the data from data that is NOT stored is more difficult. Luke tries, but it may be a lossy process. Best Erick On Thu, Oct 30, 2008 at 3:24 PM, Dragon Fly <[EM

Read all the data from an index

2008-10-30 Thread Dragon Fly
Hi, I have an old index that was built a few months ago. The data that I used to build the index has been deleted from the database. I'd like to read all the data from the old index to build a new index. Which Lucene API calls should I use to read all the data from the old index? Thank you i

Re: performance boost through multithreaded query processing?

2008-10-30 Thread Chris Hostetter
: We improved the performance through caching the bitsets of the single : fuzzy query/wildcard query. : Within our logs we can see that combined queries within a BooleanQuery : are processed sequentially. So our question are: Does it make sense for : you to parallelize the processing of the qu

Re: Lucene Payload

2008-10-30 Thread Grant Ingersoll
Not directly, I don't think. Mark Miller contributed some highlighting code that converts phrase queries to SpanNearQueries, I believe, but this isn't general purpose.We probably need a QueryParser that produces SpanQueries instead of regular Queries, I suppose, but they aren't always

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Todd Benge
I tried the term divisor index prior to the posting and didn't see much difference in the memory usage. I don't think we can turn off field norms because we use boosting to influence some content to the front of the results. Will definitely spend some time with Solr, Terracotta, and possibly hado

Re: Lucene Payload

2008-10-30 Thread Anshul jain
Thanks Grant the presentation, it was very useful. Can payload work for queries other than Term queries and Span queries? Or is there any function to convert Query into span query? Thanks On Thu, Oct 23, 2008 at 4:08 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > You can search the archives fo

Re: Lucene Payload

2008-10-30 Thread Anshul jain
Thanks Grant the presentation, it was very useful. Can payload work for queries other than Term queries and Span queries? Or is there any function to convert Query into span query? Thanks On Thu, Oct 23, 2008 at 4:08 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > You can search the archives

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
mark harwood wrote: Regretfully, I'm a terrible Swing programmer I know you've raised this before, - I wasn't prompting you to do the work :) I did make some promising in-roads into a GWT web-based version which was Apache-license friendly but ultimately I didn't want to bring in a build-time

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andy Triana
whichever is chosen. Just a huge thank you for making this tool available! Great tool! //andy On Thu, Oct 30, 2008 at 4:06 AM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Hi all, > > Many people ask me when the next version of Luke becomes available. It's > almost ready, and the release shoul

Re: Document marked as deleted

2008-10-30 Thread Mark Miller
John G wrote: I have an index with a particular document marked as deleted. If I use the search method that returns TopDocs and that deleted document satisfies the search criteria, will it be included in the returned TopDocs object even though it has been marked as deleted? Thanks in advance. J

Document marked as deleted

2008-10-30 Thread John G
I have an index with a particular document marked as deleted. If I use the search method that returns TopDocs and that deleted document satisfies the search criteria, will it be included in the returned TopDocs object even though it has been marked as deleted? Thanks in advance. John G. -- View

ApacheCon Reminder

2008-10-30 Thread Grant Ingersoll
For those attending ApacheCon in New Orleans next week, the Lucene Search and Machine Learning Birds of a Feather (BOF) will be held Wednesday night. Please indicate your interest at: http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08 Also, note there are a number of Lucene/Solr/Mahout tal

Re: Luke is coming .. not there yet.

2008-10-30 Thread mark harwood
>>Regretfully, I'm a terrible Swing programmer I know you've raised this before, - I wasn't prompting you to do the work :) I did make some promising in-roads into a GWT web-based version which was Apache-license friendly but ultimately I didn't want to bring in a build-time dependency on the 1

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
mark harwood wrote: I'd like to ask the Lucene user community what version of Lucene would be preferable A Swing-based one, managed in Lucene/contrib and released with every Lucene build . ;) I agree, this would be ideal. Regretfully, I'm a terrible Swing programmer, so unless someone el

Re: Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
Andrzej Bialecki wrote: 1) Luke 2.4 release. This has the advantage of being an official stable [...] 2) Luke 2.9-dev snapshot. This has the advantage that you get the [...] Of course I meant Lucene 2.4 and Lucene 2.9-dev ... sorry for the confusion. -- Best regards, Andrzej Bialecki

Re: Luke is coming .. not there yet.

2008-10-30 Thread mark harwood
>>I'd like to ask the Lucene user community what version of Lucene would be >>preferable A Swing-based one, managed in Lucene/contrib and released with every Lucene build . ;) - Original Message From: Andrzej Bialecki <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thur

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread mark harwood
One issue with the existing field cache implementation is that it uses int arrays to reference into the list of unique terms where short or even byte arrays may suffice for fields with smaller numbers of unique terms. How many unique terms do you have? I posted some code that measures the potent

Luke is coming .. not there yet.

2008-10-30 Thread Andrzej Bialecki
Hi all, Many people ask me when the next version of Luke becomes available. It's almost ready, and the release should happen in about a week, depending on the situation in my daily job. I'd like to ask the Lucene user community what version of Lucene would be preferable to include in this Lu

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Mark Miller
Michaels got some great points (he the lucene master), especially possibly turning off norms if you can, but for an index like that i'd reccomwnd solr. Solr sharding can be scaled to billions (min a billion or two anyway) with few limitations (of course there are a few). Plus it has further

Re: IllegalStateEx thrown when calling close

2008-10-30 Thread Michael McCandless
Actually, yes in 2.3.2: IndexReader.unlock has existed for a long time. In 2.4.0, we moved this to IndexWriter.unlock. Mike Jed Wesley-Smith wrote: not in 2.3.2 though. cheers, jed. Michael McCandless wrote: Or you can use IndexReader.unlock. Mike Jed Wesley-Smith wrote: Michael McCa

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Michael McCandless
The terms index (*.tii), which is loaded entirely into RAM, can consume an unexpectedly large amount of memory when there are an unusually high number of terms. If you are not using compound file format, can you look at the size of *.tii? If this is what is affecting you, one simple wor

Re: IllegalStateEx thrown when calling close

2008-10-30 Thread Michael McCandless
OK I'll add that (what IW does on setting an OOME) to the javadocs. Mike Jed Wesley-Smith wrote: Mike, regarding this paragraph: "To workaround this, on catching an OOME on any of IndexWriter's methods, you should 1) forcibly remove the write lock (IndexWriter.unlock static method) and then

Re: Querying wildcard

2008-10-30 Thread Anshum
Hi Aditi, In that case I could suggest you to just index the domain name seperately as well i.e. index the following fields : email adddess, domain name; instead of just email address. When I said reverse the tokens, you could reverse the tokens while indexing(just flipping the text string while in