Re: lucene index file randomly crash and need to reindex

2010-01-12 Thread zhang99
what is the longest time you ever keep index file without required to reindex. i notice even big open source life liferay suffer from this. thanks for the tips -- View this message in context: http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139613.html Se

Re: how to follow intranet: configuration in nutch website

2010-01-12 Thread jyzhou817
Thanks. --- On Wed, 13/1/10, Otis Gospodnetic wrote: From: Otis Gospodnetic Subject: Re: how to follow intranet: configuration in nutch website To: java-user@lucene.apache.org Date: Wednesday, 13 January, 2010, 12:07 PM Zhou, Your question will get more attention if you send it to nutch-u...

Is it possible to do a PhraseQuery using XML Query Parser?

2010-01-12 Thread syedfa
Dear fellow Java developers: Is it possible to do a PhraseQuery when using the XML Query Parser? I checked the documentation for the XML Query Parser, and it has tags for a multitude of queries, with PhraseQuery absent from the list. Is it possible to do a PhraseQuery using the XMLQueryParser,

Re: lucene index file randomly crash and need to reindex

2010-01-12 Thread Otis Gospodnetic
Hi, Use the latest version of Lucene, obey Lucene's locks, write with 1 IndexWriter, avoid NFS... Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: zhang99 > To: java-user@lucene.apache.org > Sent: Tue, January 12, 2010 10:41:19 PM > Subje

Re: how to follow intranet: configuration in nutch website

2010-01-12 Thread Otis Gospodnetic
Zhou, Your question will get more attention if you send it to nutch-u...@lucene.apache.org list instead. This list is for Lucene Java. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: "jyzhou...@yahoo.com" > To: java-user@lucene.apache.o

how to follow intranet: configuration in nutch website

2010-01-12 Thread jyzhou817
Hi, I try to following the instruction from http://lucene.apache.org/nutch/tutorial8.html . Intranet: Configuration To configure things for intranet crawling you must:1. Create a directory with a flat file of root urls. For example, to crawl the nutch site you might start with a file named

lucene index file randomly crash and need to reindex

2010-01-12 Thread zhang99
how you all deal wich such issue of occasionally need to reindex? what recommendation do you suggest to minimize this? -- View this message in context: http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139147.html Sent from the Lucene - Java Users mailing li

Supported way to get segment from IndexWriter?

2010-01-12 Thread Chris Hostetter
A conversation with someone earlier today got me thinking about cranking out a patch for SOLR-1559 (in which the goal is to allow for rules do dermine the iput to optimize(maxNumSegments) instead of requiring a fixed integer value as input) when i realized that i wasn't certain what "approve

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor
Thanks Felipe, but you are missing the point Artist really doesnt come into it, my problem is confined to the alias field, forget about artist its just detailed to give the complete scenario Paul Felipe wrote: You could change the boost of the field artist to be bigger than the field alias.

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Felipe
You could change the boost of the field artist to be bigger than the field alias. field.setBoost(artistBoost); 2010/1/12 Paul Taylor > Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer > since Version field introduced) and discovered a problem with field lenghth > bo

Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor
Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer since Version field introduced) and discovered a problem with field lenghth boosting for me. I have a document that represents a recording artist (i.e Madonna, The Beatles ectera) it contains an artist and an alias field

NYC Search in the Cloud meetup: Jan 20

2010-01-12 Thread Otis Gospodnetic
Hello, If "Search Engine Integration, Deployment and Scaling in the Cloud" sounds interesting to you, and you are going to be in or near New York next Wednesday (Jan 20) evening: http://www.meetup.com/NYC-Search-and-Discovery/calendar/12238220/ Sorry for dupes to those of you subscribed to mul

RE: Exception invoking MultiPhraseQuery

2010-01-12 Thread Woolf, Ross
Thanks, I'll try that. As for the stack trace "com.sun.jdi.InvocationException occurred invoking method" is the total of the error I get. And I only see this when I select "mpq" in the Variables window and that is displayed instead of showing the mpq object. I've tried catching the exception

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-12 Thread Michael McCandless
Is it possible that you're not closing the old IW on the RAMDir before deleting files / re-using it? Or, any other possible way that two open writers could accidentally share the same RAMDir? Do you override the LockFactory of the RAMDir? EG with ConcurrentMergeScheduler, it can continue to writ

Re: Exception invoking MultiPhraseQuery

2010-01-12 Thread Erick Erickson
I'd try running it outside of Eclipse, and/or checking each and every of the many configuration options in Eclipse to see if you have an old jar that Eclipse is using, from jars you've made accessible via the "java build path" window to projects referenced to.. Alternately, you can look for al

SF Bay Area Lucene Meetup Jan. 21st

2010-01-12 Thread Grant Ingersoll
There will be a San Francisco/Bay Area meetup on Jan. 21st at 7:15 PM at the "Hacker Dojo" (don't ask me...) location. RSVP and all the details are at http://www.meetup.com/SFBay-Lucene-Solr-Meetup/ Hope to see you there, Grant --

Exception invoking MultiPhraseQuery

2010-01-12 Thread Woolf, Ross
I can't invoke MultiPhraseQuery. It produces the error: com.sun.jdi.InvocationException occurred invoking method Here is the code: MultiPhraseQuery mpq = new MultiPhraseQuery(); In the eclipse debugger when I try to inspect mpq after instantiating it shows the error. I'm on Lucene 2.9.1 with J

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor
On Tue, Jan 12, 2010 at 7:53 AM, Paul Taylor > wrote: Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS when indexing fields that arent tokenized, but later says norms are used to boost fields with less /single term, so matches based

[JOB] Java/Lucene/Nutch developer in Zurich, Switzerland

2010-01-12 Thread Michael Wechner
Dear Developers We are looking for Java/Lucene/Nutch developers with over 2-3 years of experience for a project we are currently working on. The location is Zurich, Switzerland onsite and the job is as employee or contractor. Please reply me privately with your contact details and experienc

Re: 2 directory providers

2010-01-12 Thread anshum.gu...@naukri.com
Hi Sourabh, If you are talking about using multiple directory implementations, then yes you may have multiple of those without any issues. Sent from BlackBerry® on Airtel -Original Message- From: "Mittal, Sourabh" Date: Tue, 12 Jan 2010 20:19:11 To: Subject: 2 directory providers Hi,

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-12 Thread Frank Geary
Thanks for the reply Mike. Your questions were good ones because I realize now I should have probably used "Corrupt IndexReader" as the subject for this thread. Here's my answers: The number stays the same until the corrupted IndexReader is reopened (if nothing changes in the IndexReader - and

2 directory providers

2010-01-12 Thread Mittal, Sourabh
Hi, Is it possible to have 2 directory providers in a application like RAM as well as File Directory? Regards, Sourabh -- NOTICE: If received in error, please destroy, and notify sender. Sender does not intend to waive con

Re: Implementing filtering based on multiple fields

2010-01-12 Thread Lucifer Hammer
Why not just add custom terms onto the end of each query for each user? i.e. When user X queries for "bananas", and has previously set their domains to search in cnn, and yahoo, then why not append the following onto the search query: "fullText:bananas AND (domain:cnn OR domain:yahoo)" Off the

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor
Erick Erickson wrote: Are you saying that you index the *same* field differently in different documents? Or do you index the field in question in the same way in all documents? Same way in all documents I ask because I'm having a hard time following the logic here. A field that is NOT analyzed

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Erick Erickson
I'd *strongly* recommend getting a copy of Luke, opening your index with it and playing around. The "explain" tab will show you a *lot* about how scoring works.. Erick On Tue, Jan 12, 2010 at 8:16 AM, Paul Taylor wrote: > Benjamin Heilbrunn wrote: > >> This is because matches in short field

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Erick Erickson
Are you saying that you index the *same* field differently in different documents? Or do you index the field in question in the same way in all documents? I ask because I'm having a hard time following the logic here. A field that is NOT analyzed is an all-or-none match, i.e. looking for "paul" in

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Paul Taylor
Benjamin Heilbrunn wrote: This is because matches in short fields (few terms) als typically more pregnant, than matches in long fields (much terms). Imagine the case with two fields named "title" and "content" representing the title and the content of books. If you match three search terms in a

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Benjamin Heilbrunn
This is because matches in short fields (few terms) als typically more pregnant, than matches in long fields (much terms). Imagine the case with two fields named "title" and "content" representing the title and the content of books. If you match three search terms in a five terms title this is a b

NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor
Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS when indexing fields that arent tokenized, but later says norms are used to boost fields with less /single term, so matches based on these single term fields would miss out on this boost. Is there a way to use NOT_ANALYSED_NO_NORM

Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Paul Taylor
Why is this , and how much is this (in plain english ) please ? thanks Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-12 Thread Michael McCandless
Not good! Can you post the ###'s from the exception? How far out of bounds is the access? Your usage sounds fine. Reopen during commit is fine. Are you sure the exception comes from the reader on the RAMDir and not your main dir? How do you periodically move your in-RAM changes to the disk in