Re: A small doubt related to write.lock

2008-01-30 Thread Doron Cohen
Hi Ajay, IndexReader.unlock() is a brute force call to be used by applications/users knowing that a lock can be safely removed. finalize() on the other hand is a method that Java will call when garbage collecting a no-more-referenced object. So it is often a place for cleanup code. However the pr

A small doubt related to write.lock

2008-01-30 Thread ajay_garg
Hi all. I will be obliged, if someone could elaborate as to what is the difference between IndexReader.unlock() and IndexWriter.finalize() methods. Thanks Ajay Garg -- View this message in context: http://www.nabble.com/A-small-doubt-related-to-write.lock-tp15199037p15199037.html Sent from the

Re: Retain the index

2008-01-30 Thread anjana m
with true: i finding a serious problem when i need new index please help.. but how can ikeep chnaging the true flase option.. :( please help me...:( Exception in thread "main" java.io.IOException: Cannot delete _17.cfs at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:144)

Re: Query in Lucene 2.3.0

2008-01-30 Thread ajay_garg
@Yonik So you mean to say, that if two threads have the same instance of an IndexWriter passed to both of them, and both these threads run on two different CPUs, then they can write to the index at the same time ? Yonik Seeley wrote: > > On Jan 30, 2008 10:59 PM, ajay_garg > <[EMAIL PROTECTED

Re: contrib/benchmark Quality

2008-01-30 Thread Doron Cohen
Hi Grant, I initially thought of doing so, but after working on the Million Queries Track where running the 10,000 queries could take more than a day (depending on the settings) and where indexing was done once and took few days I felt that a more tight control is needed than that provided by the b

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-30 Thread Otis Gospodnetic
Not me, but it looks useful and something I could actually use (exactly to look at synchronization bottlenecks in situations where many threads are sharing a single IndexSearcher). Unfortunately, it looks like it works only with IBM's JVM: "Any platform running an IBM®-supplied Java™ SDK or JRE

Re: Spell checking street names

2008-01-30 Thread Otis Gospodnetic
Hmmm, "untokenized n-gram spell checker"... does that really make sense? lucene as 2-gram: lu uc ce en ne. but all as a single token? No, I don't think that will work with the Lucene spellchecker. As for non-tokenizing Analyzer - KeywordAnalyzer. Otis -- Sematext -- http://sematext.com/

Re: Query in Lucene 2.3.0

2008-01-30 Thread Yonik Seeley
On Jan 30, 2008 10:59 PM, ajay_garg <[EMAIL PROTECTED]> wrote: > > Thanks Mike for your directions. > > Yes, I am in fact using a single computer for my application, and your > saying that in this case, multiple threads with a single IndexWriter wll > give a better performance. Hmmm. I just wonder

Re: Query in Lucene 2.3.0

2008-01-30 Thread ajay_garg
Thanks Mike for your directions. Yes, I am in fact using a single computer for my application, and your saying that in this case, multiple threads with a single IndexWriter wll give a better performance. Hmmm. I just wonder that since each IndexWriter has a single write.lock, this means that sitt

appending field to an existing index

2008-01-30 Thread John Wang
Hi all: We have a large index and it is difficult to reindex. We want to add another field to the index without reindexing, e.g. just create a new inverted index, dictionary files etc. How feasible is it to add this to lucene? Thanks -John

Re: Escape character and Special character

2008-01-30 Thread Daniel Naber
On Mittwoch, 30. Januar 2008, Joshua W Hui wrote: > Thanks for the information. Does it also apply to fuzzy search? I think so. > Also, a simple question... how can I find out which release the fix will > go in? Currently, it only has a patch. It's not yet assigned to any version (it says "Fix

contrib/benchmark Quality

2008-01-30 Thread Grant Ingersoll
Has anyone thought about integrating the contrib/benchmark Quality stuff into the "algorithm" framework that's used for timings, etc.? For instance, I would like to write an algorithm file where my rounds consist of doing various runs with different similarities all on the same index. It

Re: Escape character and Special character

2008-01-30 Thread Joshua W Hui
Thanks for the information. Does it also apply to fuzzy search? Also, a simple question... how can I find out which release the fix will go in? Currently, it only has a patch. Joshua Daniel Naber

Re: Escape character and Special character

2008-01-30 Thread Daniel Naber
On Mittwoch, 30. Januar 2008, Joshua W Hui wrote: > When I tried to do a lucene search using escape character with other > special character like the following: > > SUBJECT:Yahoo\!~0.5 > SUBJECT:Yahoo\!* > > It seems the parser totally ignores the escape character, and becomes It's a known bug, s

Escape character and Special character

2008-01-30 Thread Joshua W Hui
Hi, When I tried to do a lucene search using escape character with other special character like the following: SUBJECT:Yahoo\!~0.5 SUBJECT:Yahoo\!* It seems the parser totally ignores the escape character, and becomes SUBJECT:Yahoo!~0.5 SUBJECT:Yahoo!* which gives me syntax exception. Any re

Re: document boost

2008-01-30 Thread Yonik Seeley
Hi Mike, I think this issue probably belongs in the Solr lists since it looks like you're indexing through it. I did a really quick test re-adding a Solr example document but adding a document boost of 10... the fieldNorm increased by a factor of 10 as expected (explain below). 5.651948 = (MATCH

Re: document boost

2008-01-30 Thread Mark Miller
If you look at DocumentsWriter at line 715 you will see the docBoost get set to the docBoost you specified. At 1376 you will see boost get assigned docBoost. Then at 1509 you see how the doc boost is multiplied by the field boost: * boost *= field.getBoost(); * So now you have the default fiel

RE: Lucene, HTML and Hebrew

2008-01-30 Thread Itamar Syn-Hershko
OK, I've been processing things for a while. I came up with an idea that I want your advice on -- is there a way I could stem the Hebrew words in my analyzer yet keep a note of some sort of the original term which was assembled by this stem, WITHOUT affecting frequency/proximity data? This is I gu

Re: document boost

2008-01-30 Thread Mike Grafton
Thanks for your help, Mark. We can start by posting our SOLR config files, although I'm not sure if that will be helpful (we don't see much in there regarding boosts). See attached. How SOLR actually configures and interfaces with Lucene is a bit of an unknown to us, so I'm not sure we can get d

Re: document boost

2008-01-30 Thread Mark Miller
I would say you def misconfigured something. Doubling your doc boost will double your fieldNorm approximately (I think the precision isn't perfect). I don't know what your doing wrong in such a small test, but your fieldNorm should *not* be exploding like that. Can you post some code? - Mar

document boost

2008-01-30 Thread Mike Grafton
Hello folks, We're trying to use Lucene's scoring to do a fairly basic thing: give a document (in this case, we index "articles") a boost based on an integer value that we know at index-time. We want the document boost to affect the final document score linearly. We thought that assigning a doc

Spell checking street names

2008-01-30 Thread Max Metral
I'm using Lucene to spell check street names. Right now, I'm using Double Metaphone on the street name (we have a sophisticated regex to parse out the NAME as opposed to the unit, number, street type, or suffix). I think that Double Metaphone is probably overkill/wrong, and a spell checking appro

Re: Luke for Lucene 2.3?

2008-01-30 Thread markharw00d
I also read something about web-based Luke, but can't find it in the contrib in 2.3, is it part of Lucene 2.3? How do I use it? See here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg13287.html I think we decided to hold off until after the Lucene 2.3 release before adding to contrib

Re: Query in Lucene 2.3.0

2008-01-30 Thread Michael McCandless
If you have a single IndexWriter, then the buffer is flushed @ 16 MB regardless of how many threads are adding to that buffer. If you are using multiple IndexWriters, writing to separate directories and then merging at the end, then each one uses 16 MB. But this isn't recommended for a s