Re[2]: strange behavior 4 query term boost

2006-09-27 Thread charliecmo
Found the reason, it is a bug IMHO. The example should be: A: term1^5 term2^6 term3^7 B: term1^5E-4 term2^6E-4 term3^7E-4 C: term1^0.0006 term2^0.0006 term3^0.0007 A & C suppose return the same rank B is different Since B will be parsed as: term1^5 E-4 term2^6 E-4 term3^7 E-4 The parser takes

Re: Can Lucene Index 50MB xml file

2006-09-27 Thread aslam bari
I am using Lucene 2.1 in Slide. Have you any idea what is the place where should i change it and how much value should i give it? Bhavin Pandya <[EMAIL PROTECTED]> wrote: Hi aslam, Here is the method you can set for very long content. IndexWriter's setMaxFieldLength. - Bhavin pandya - Or

Re: lock file of lucene

2006-09-27 Thread jacky
hi, yes, but there maybe several programs to write the index. If a program is writing into the index, the index database maybe corrupted when another program unlocks it Best Regards. jacky - Original Message - From: "Bhavin Pandya" <[EMAIL PROTECTED]> To: Sent:

Re: Caused by: java.io.IOException: The handle is invalid

2006-09-27 Thread Michael McCandless
Van Nguyen wrote: I'm running this on Windows 2003 server (NTFS). The Java VM version is 1.5.0_06. This exception is not consistent, but it is not intermittent either. It does not throw it at any particular point while rebuilding the index, but it will throw this exception at some point (it co

Re: strange behavior 4 query term boost

2006-09-27 Thread Chris Hostetter
I assume you mean that the set of matches is hte same, but the scores (and possibly the order) are different correct? The IndexSearcher.explain methods should help make the reason clear -- compare the output for each query when looking at the same docIds. I suspect what you'll find is that with

strange behavior 4 query term boost

2006-09-27 Thread charliecmo
Hello, I don't understand why the following two queries give totally different results. term1^5 term2^6 term3^7 term1^0.0005 term2^0.0006 term3^0.0007 Can anyone explain? Thanks. (lucene2.0, using TopDocs) -- Thanks, Charlie --

Re: Writing - Searching synchronization

2006-09-27 Thread Erick Erickson
Yes. New additions to an index are NOT searchable until the indexSearcher/IndexReader is closed and reopened, just as you are observing. I think of it as the IndexSearcher taking a "snapshot" of the index when it is instantiated and operating on that snapshot exclusively thereafter, regardless of

Re: lock file of lucene

2006-09-27 Thread Michael McCandless
Bhavin Pandya wrote: > Before you open IndexWriter object you can check whether lock file > exists or not and if its available you can unlock it. > Use IndexReader.isLocked and IndexReader.unlock. Also, you could use a try / finally and always close the IndexWriter in the finally clause, which sh

Writing - Searching synchronization

2006-09-27 Thread Luis Rodrigo Aguado
Hi all, I have a problem with a lucene-based application I am trying to build. The application is mainly search oriented, and the core of the index is built in a batch process before starting the system. In the application initialization an IndexSearcher object is built, to perform all the se

Re: Very high fieldNorm for a field resulting in bad results

2006-09-27 Thread Chris Hostetter
: 1. Can I do away with index-time boosting for fields & tweak : query-time boosting for them ? I understand that doc level boosting is : very useful while indexing. : But for fields, both index-boost & query-boost are mutiples which lead : to the score, so would it be safe to say that I can repla

Re: Splitting the index

2006-09-27 Thread Erick Erickson
I'd ask for more details. You say that you've narrowed it down to Lucene doing the searching But which part of the search? Here're two places people have run into problems before (sorry if you already know this...). 1> Iterating through the entire returned set with Hits.doc(#). 2> opening and

Re: Splitting the index

2006-09-27 Thread Erik Hatcher
Lots of possible issues, but we need more information to troubleshoot this properly. How big is your index, number of documents? total file system size of the index? is your index optimized? how often do you update the index? how are you managing indexsearcher instances after the inde

Splitting the index

2006-09-27 Thread Rob Young
Hi, I'm using Lucene to search a product database (CDs, DVDs, games and now books). Recently that index has increased in size to over a million items (added books). I have been performance testing our search server and the throughput of requests has dropped significantly, profiling the server i

Re: Lucene In Action Book vs Lucene 2.0

2006-09-27 Thread Steven Rowe
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_0_0/CHANGES.txt Otis Gospodnetic wrote: > CHANGES.txt is your best source for that answer. > > KEGan <[EMAIL PROTECTED]> wrote: > > What about the internal of Lucene? Are there any major changes in there? -

Re: Lucene In Action Book vs Lucene 2.0

2006-09-27 Thread Otis Gospodnetic
Hi, Internals have changed some, but I don't think the changes are substantial. CHANGES.txt is your best source for that answer. LIA2 some time in early 2007, I hope, but don't hold me on that estimate. Otis - Original Message From: KEGan <[EMAIL PROTECTED]> To: java-user@lucene.apac

Re: Multiple Terms, Delete From Index

2006-09-27 Thread Otis Gospodnetic
Josh, Yes, it will. I thought that is what you wanted, but I see now that you are looking for docs that match both. Search and delete by id from Hits as Erik mentioned. Otis - Original Message From: Josh Joy <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, Septemb

Re: term OR term OR term OR .... query question

2006-09-27 Thread Grant Ingersoll
See below. Also, there is new Scoring documentation available via the website (http://lucene.apache.org/java/docs/scoring.html) that covers scoring in some detail. On Sep 26, 2006, at 5:23 PM, Vladimir Olenin wrote: Hi. I have a question regarding Lucene scoring algorithm. Providing I

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-09-27 Thread Grant Ingersoll
You might also look at some other NLP tools, such as OpenNLP which you can train for your collection, or if you are interested in buying, there are many products on the market that do similar things On Sep 26, 2006, at 9:36 AM, Otis Gospodnetic wrote: Look at LingPipe from Alias-i.com. Loo

Re: Can Lucene Index 50MB xml file

2006-09-27 Thread Bhavin Pandya
Hi aslam, Here is the method you can set for very long content. IndexWriter's setMaxFieldLength. - Bhavin pandya - Original Message - From: "aslam bari" <[EMAIL PROTECTED]> To: Sent: Wednesday, September 27, 2006 2:29 PM Subject: Can Lucene Index 50MB xml file Dear all, I want t

Can Lucene Index 50MB xml file

2006-09-27 Thread aslam bari
Dear all, I want to confirm that can Lucene index 50MB xml file. Or i have to change in source code to make it work. Because i think there is some limits of tokens in Lucene. So it is not doing indexing of whole document. Any Views? Where should i change in code.

Re: cache persistent Hits

2006-09-27 Thread Shane Perry
As I am always looking for ways to enhance a searches response time, if I were to use the MultiReader as suggested, would it still be possible to determine which index a hit came from? Currently I use the MultiSearcher.subSearcher() method to determine this information. After taking a, albei

Re: Multiple Terms, Delete From Index

2006-09-27 Thread Erik Hatcher
Iterate through all Hits for "city:city1 AND state:state1" and delete them by document ID. Erik On Sep 26, 2006, at 10:04 PM, Josh Joy wrote: Hi All, I need to delete from the index where 2 terms are matching, rather than just one term. For example, IndexReader reader = IndexReade

Re: lock file of lucene

2006-09-27 Thread Bhavin Pandya
Hi jacky, Before you open IndexWriter object you can check whether lock file exists or not and if its available you can unlock it. Use IndexReader.isLocked and IndexReader.unlock. - Bhavin pandya - Original Message - From: "jacky" <[EMAIL PROTECTED]> To: Sent: Wednesday, September

Re: searching for the part of a term.

2006-09-27 Thread heritrix . lucene
Hi, Thanks for yor reply.. : Since the overhead in first is the speed of the system, i think adopting : second method will be better. Since iMy index size is around 10GB the second method is also taking a lot of time for queries like "am". One more things that i found in http://www.gossame

lock file of lucene

2006-09-27 Thread jacky
hi, When writing into an index, lucene will create a write lock file. So, if there is an error during the writing. the lock file will not be deleted. And also the JVM will not be closed for some time. So the program will have no chance to get lock of this index. Is there any method to avoid th