date:20100112

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-12 Thread Michael McCandless

Not good!  Can you post the ###'s from the exception?  How far out of
bounds is the access?

Your usage sounds fine.  Reopen during commit is fine.

Are you sure the exception comes from the reader on the RAMDir and not
your main dir?

How do you periodically move your in-RAM changes to the disk index?

Have you run CheckIndex on your index?  Also, try running with asserts
enabled... it may catch whatever is happening, sooner.

Do you hold open a single IW on the RAMDir, and re-use it?  Or open
and then close?  What about the disk dir?

Mike

On Mon, Jan 11, 2010 at 1:42 PM, Frank Geary  wrote:
>
> Hi,
>
> I'm using Lucene 2.4.1 and am seeing occasional index corruption.  It shows
> up when I call MultiSearcher.search().  MultiSearcher.search() throws the
> following exception:
>
> ArrayIndexOutOfBoundsException.  The error is: Array index out of range: ###
> where ### is a number representing an index into the deletedDocs BitVector
> in SegmentTermDocs.  the stack trace is as follows:
>
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.util.BitVector.get(BitVector.java:91)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.next(MultiSegmentReader.java:554)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:384)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:415)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:206)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:167)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:55)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.TopFieldDocCollector.(TopFieldDocCollector.java:43)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:232)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> org.apache.lucene.search.Searcher.search(Searcher.java:86)
> 2010-01-09 20:40:00,561 [pool-1-thread-4] ERROR -
> com.acquiremedia.opens.index.searcher.HybridIndexSearcher.search(HybridIndexSearcher.java:311)
> .
> .
> .
>
> That makes sense, but I'm trying to understand what could be causing the
> corruption.
>
> Here's what I'm doing:
>
> 1) I have an IndexWriter created usnig a RAMDirectory.
>
> 2) I have a single thread processing index adds and index deletes.  This
> thread is rather simple and calls IndexWriter.addDocument() followed by
> IndexWriter.commit() or IndexWriter.deleteDocuments() followed by
> IndexWriter.commit().  The commits are done at this point because we want
> the documents available for searching immediately.
>
> 3) I also have 4 search threads running at the same time as the index write
> thread.  Each time a search thread executes a search it calls
> IndexReader.reopen()  on the existing IndexReader for the index created
> using the RAMDirectory above, gets an existing index reader for another
> on-disk index and then calls MultiSearcher.search() (this brings together
> the RamDirectory index and an on-disk index) to execute the search.
>
> It generally works fine, but every few days or so I get the above
> ArrayIndexOutOfBoundsException.   My suspicion is that when the
> IndexWritert.commit() call happens due to a delete at the exact same time as
> the IndexReader.reopen() call happens, the IndexReader has a deleteDocs
> BitVector which reflects the delete, but something else internal to the
> IndexReader is not reflecting the delete.
>
> So, I implemented a semaphore mechanism to prevent IndexWriter.commit() from
> happening at the same time as IndexReader.reopen().  I only implemented the
> semaphores in the delete case because my guess was that an add would be
> unlikely to affect the deleteDocs Bit Vector.  Unfortunately, the problem
> continues to happen.
>
> I believe I read somewhere that a similar thread saftey issue had been fixed
> in Lucene 2.4, but I suspect there may still be an issue in 2.4.1.
>
> Does anyone have

Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Paul Taylor




Why is this , and how much is this (in plain english ) please ?

thanks Paul



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor

Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS when 
indexing fields that arent tokenized, but later says norms are used to 
boost fields with less /single term, so matches based on these single 
term fields would miss out on this boost. Is there a way to use 
NOT_ANALYSED_NO_NORMS on these fields will will mean they end up with 
the best boost (1.0 as default) , and then documents that are analysed 
with norms receive a negative boost (<1.0) if they contain more than one 
token.


I'm not using Document or Field boosting, so seems a bit silly for me to 
store all these norms just to say this field contains a single token and 
therefore should get an addtional boost.


Perhaps Im misundersanding this, and this would work as required.


thanks Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Benjamin Heilbrunn

This is because matches in short fields (few terms) als typically more
pregnant, than matches in long fields (much terms).

Imagine the case with two fields named "title" and "content"
representing the title and the content of books.
If you match three search terms in a five terms title this is a better
hit than if you match those three search terms in the content of the
book.

The length normalization factor is calculated by your Similarity
implementation in the method
public float lengthNorm(String fieldName, int numTokens)

Does that help you?

2010/1/12 Paul Taylor :
>
>
> Why is this , and how much is this (in plain english ) please ?
>
> thanks Paul
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Paul Taylor


Benjamin Heilbrunn wrote:

This is because matches in short fields (few terms) als typically more
pregnant, than matches in long fields (much terms).

Imagine the case with two fields named "title" and "content"
representing the title and the content of books.
If you match three search terms in a five terms title this is a better
hit than if you match those three search terms in the content of the
book.

The length normalization factor is calculated by your Similarity
implementation in the method
public float lengthNorm(String fieldName, int numTokens)

Does that help you?

  


Yes, thanks it does I was just getting it, is it base purely on matching 
a field with less terms rather than the percentage of terms in a field 
matched.
i.e If you match three search terms in a five terms field would this be 
better then if you match those four search terms in a six term field.



do you know the answer to my second post.
i.e what does default lengthNorm return for a single term field, 
(compared to if have no NO NORM whereby assume value 1.0)


Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Erick Erickson

Are you saying that you index the *same* field differently in different
documents? Or do you index the field in question in the same way in
all documents?

I ask because I'm having a hard time following the logic here. A
field that is NOT analyzed is an all-or-none match, i.e.
looking for "paul" in an unanalyzed field "paul taylor" will
not match, so boosting is pretty irrelevant on that field.

If you're analyzing the same field for some documents and not
analyzing it for other documents, I don't know what happens, but
it's probably bad.

Could you boost your *other* fields by less than one to achieve
the same end?

If none of this is relevant, could you explain your use case a little
more?

HTH
Erick

On Tue, Jan 12, 2010 at 7:53 AM, Paul Taylor  wrote:

> Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS when
> indexing fields that arent tokenized, but later says norms are used to boost
> fields with less /single term, so matches based on these single term fields
> would miss out on this boost. Is there a way to use NOT_ANALYSED_NO_NORMS on
> these fields will will mean they end up with the best boost (1.0 as default)
> , and then documents that are analysed with norms receive a negative boost
> (<1.0) if they contain more than one token.
>
> I'm not using Document or Field boosting, so seems a bit silly for me to
> store all these norms just to say this field contains a single token and
> therefore should get an addtional boost.
>
> Perhaps Im misundersanding this, and this would work as required.
>
>
> thanks Paul
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

2010-01-12 Thread Erick Erickson

I'd *strongly* recommend getting a copy of Luke, opening your index
with it and playing around. The "explain" tab will show you a *lot*
about how scoring works..

Erick

On Tue, Jan 12, 2010 at 8:16 AM, Paul Taylor  wrote:

> Benjamin Heilbrunn wrote:
>
>> This is because matches in short fields (few terms) als typically more
>> pregnant, than matches in long fields (much terms).
>>
>> Imagine the case with two fields named "title" and "content"
>> representing the title and the content of books.
>> If you match three search terms in a five terms title this is a better
>> hit than if you match those three search terms in the content of the
>> book.
>>
>> The length normalization factor is calculated by your Similarity
>> implementation in the method
>> public float lengthNorm(String fieldName, int numTokens)
>>
>> Does that help you?
>>
>>
>>
>
> Yes, thanks it does I was just getting it, is it base purely on matching a
> field with less terms rather than the percentage of terms in a field
> matched.
> i.e If you match three search terms in a five terms field would this be
> better then if you match those four search terms in a six term field.
>
>
> do you know the answer to my second post.
> i.e what does default lengthNorm return for a single term field, (compared
> to if have no NO NORM whereby assume value 1.0)
>
>
> Paul
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor


Erick Erickson wrote:

Are you saying that you index the *same* field differently in different
documents? Or do you index the field in question in the same way in
all documents?

Same way in all documents


I ask because I'm having a hard time following the logic here. A
field that is NOT analyzed is an all-or-none match, i.e.
looking for "paul" in an unanalyzed field "paul taylor" will
not match, so boosting is pretty irrelevant on that field.


But if it does match won't this affect the end score

i.e, consider Doc 1 has unalysed field containing fred and Doc 2 
analysed field contain tom sawyer

and this search unalysedfield:fred OR analysedfield:tom

If it had norms, Doc1 would come first , but without norms they would be 
joint first right ?




Could you boost your *other* fields by less than one to achieve
the same end?

What are the values, what is the value given to a single field norm boosted

Paul


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Implementing filtering based on multiple fields

2010-01-12 Thread Lucifer Hammer

Why not just add custom terms onto the end of each query for each user?
i.e.  When user X queries for "bananas", and has previously set their
domains to search in cnn, and yahoo, then why not append the following onto
the search query:   "fullText:bananas AND (domain:cnn OR domain:yahoo)"

Off the top of my head there's a few caveats:

1) if the domain list is large, you'll have to deal with the maxbooleans
setting
2) parsing the query can be slow, however, there's a tradeoff between
managing thousands of indexes vs a slight performance hit (Or, you can put
the query together without parsing - depends on how you handle the users
query terms)

This seems like too simple an approach, I'm sure I'm not understanding
something...

LH
On Fri, Jan 8, 2010 at 5:16 AM, Yaniv Ben Yosef  wrote:

> Thanks Otis, that's very helpful.
>
> On Fri, Jan 8, 2010 at 2:08 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com
>  > wrote:
>
> > Ah, well, masking it didn't help.  Yes, ignore Bixo, Nutch, and Droids
> > then.
> > Consider DataImportHandler from Solr or wait a bit for Lucene Connectors
> > Framework to materialize.  Or use LuSql, or DbSight, or Sematext's
> Database
> > Indexer.
> >
> > Yes, I was suggesting a separate index for each user.  That's what Simpy
> > uses and has some 200K indices on 1 box and I think dozens of QPS
> > without any caching, if I remember correctly.  Load is under 1.0.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > - Original Message 
> > > From: Yaniv Ben Yosef 
> > > To: java-user@lucene.apache.org
> > > Sent: Thu, January 7, 2010 6:55:18 PM
> > > Subject: Re: Implementing filtering based on multiple fields
> > >
> > > Thanks Otis.
> > >
> > > If I understand correctly - Bixo, Nutch and Droids are technologies to
> > use
> > > for crawling the web and building an index. My project is actually
> about
> > > indexing a large database, where you can think of every row as a web
> > page,
> > > and a particular column is the equivalent of a web site. (I didn't
> > mention
> > > that in the previous post because I didn't want to complicate my
> > question,
> > > and it seems equivalent to Google CSE given that Lucene can use
> virtually
> > > any input for indexing, AFAIK)
> > > Therefore I'm not sure if the frameworks you've mentioned are
> applicable
> > to
> > > my project as they seem to be related to web page indexing, but perhaps
> > I'm
> > > missing something.
> > > Also, what did you mean about isolating users and their data/indices.
> Did
> > > you mean that I should create a separate index per user?
> > >
> > > Thanks again!
> > >
> > > On Fri, Jan 8, 2010 at 12:35 AM, Otis Gospodnetic <
> > > otis_gospodne...@yahoo.com> wrote:
> > >
> > > > For something like CSE, I think you want to isolate users and their
> > > > data/indices.
> > > >
> > > > I'd look at Bixo or Nutch or Droids ==> Lucene or Solr
> > > >
> > > > Otis
> > > > --
> > > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > > >
> > > >
> > > >
> > > > - Original Message 
> > > > > From: Yaniv Ben Yosef
> > > > > To: java-user@lucene.apache.org
> > > > > Sent: Thu, January 7, 2010 3:54:22 PM
> > > > > Subject: Implementing filtering based on multiple fields
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm very new to Lucene. In fact, I'm at the beginning of an
> > evaluation
> > > > > phase, trying to figure whether Lucene is the right fit for my
> needs.
> > > > > The project I'm involved in requires something similar to the
> Google
> > > > Custom
> > > > > Search Engine (CSE). In CSE, each user can
> > > > > define a set (could be a large set) of websites, and limit the
> search
> > to
> > > > > only those websites. So for example, I can create a CSE that
> searches
> > all
> > > > > web pages on cnn.com, msnbc.com and nytimes.com only.
> > > > > I am trying to understand whether and how I can do something
> similar
> > in
> > > > > Lucene.
> > > > >
> > > > > The FAQ hints about this possibility
> > > > > here,
> > > > > but it mentions a class that no longer exists in 3.0 (QueryFilter),
> > and
> > > > is
> > > > > very laconic about the suggested options. Also I'm not sure how
> well
> > it
> > > > will
> > > > > perform in my use case (or even if it fits at all).
> > > > > I thought about creating a separate index for each user or CSE.
> > However,
> > > > my
> > > > > system should be able to handle tens of thousands of concurrent
> > users. I
> > > > > haven't done any analysis yet on how this will affect CPU, RAM, I/O
> > and
> > > > > storage size, but was wondering if any of you experienced Lucene
> > > > > users/developers think it's a good direction.
> > > > > If that's not a good idea, what would be a good strategy here?
> > > > >
> > > > > Any help will be much appreciated,
> > > > > Yaniv
> > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: java-user-u

2 directory providers

2010-01-12 Thread Mittal, Sourabh

Hi,

Is it possible to have 2 directory providers in a application like RAM as well 
as File Directory?

Regards,
Sourabh

--
NOTICE: If received in error, please destroy, and notify sender. Sender does 
not intend to waive confidentiality or privilege. Use of this email is 
prohibited when received in error. We may monitor and store emails to the 
extent permitted by applicable law.

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-12 Thread Frank Geary

Thanks for the reply Mike.  Your questions were good ones because I realize
now I should have probably used "Corrupt IndexReader" as the subject for
this thread.  Here's my answers:

The number stays the same until the corrupted IndexReader is reopened (if
nothing changes in the IndexReader - and thus we get the same IndexReader
back from reopen - the problem persists).  Then the next time the problem
occurs, after we've gotten at least one non-problematic IndexReader and thus
a few successful searches, the number is different.  For what it's worth my
latest ### was 176.  My assumption has always been that it is most likely to
be 1 beyond the end of the BitVector because each commit() is only changing
the index by adding or deleting one document.  But I don't know for sure.

Yes, I had always expected that reopen during commit would be fine, and my
semaphore code seems to confirm that (unless adds have something very subtle
to do with it).  Any other theories would be very much appreciated.

At first I wasn't sure whether it was the RAMDirectory or the main dir.  But
since I now have many examples where the RAMDirectory IndexReader has
changed and the main dir IndexReader has not, then I feel that the
RAMDirectory IndexReader is the problem.  We reopen a new IndexReader every
time we do a search on a RAMDirectory.  For the main dir, we rarely reopen
an IndexReader.  We only reopen the main dir IndexReader when a RAM
Directory is gotten rid of, which happens once the RAM directory indexes
1 documents.  But here's a typical example where the main dir
IndexReader stays the same throughout the problem:

1) both the RAMDirectory IndexReader and the main dir IndexReader are set
after the last time we got rid of the RAMDirectory which was full with 1
documents.
2) then we receive about 1077 new documents and add them to the RAMDirectory
as well as to the main dir indexes (with a number of deletes scattered
throughout as well).  The RAM directory is commited after every add or
delete and the main dir is not committed at all.
3) then a search comes in, we reopen the RAM Directoy IndexReader, use the
existing main dir IndexReader (which does not reflect any index changes
since step 1) and do the search and get ArrayIndexOutOfBounds so the search
fails (in this case the ### was 208)
4) then we simply go one and get another 4000 adds and deletes in the RAM
directory and main directory.  Again the RAM directory is commited after
every add or delete and the main dir is not committed at all.
5) then the next search comes in, reopens the RAMDirectory IndexReader, uses
the existing main dir IndexReader (which does not reflect any index changes
since step 1) and the search works fine!
6) during all that time the main dir IndexReader never got reopened, and the
RAMDirectory IndexReader only got reopened twice - once it was bad and the
next it was OK.

To answer your question about when we periodically move our in-RAM changes
to the disk index (my apologies if this is redundant):
We never move them.  They are duplicated in both the RAM directory and
on-disk directory as they come in.  Then when the RAM directory determines
that it has 1 documents, we begin writing to a second RAM Directory,
reopen and warmup the on-disk IndexReader and once that new on-disk
IndexReader is ready, we throw away the RAM directory which had 1
documents (to do this we actually clean out the RAM directory by deleting
all the files it contains and we create a new IndexWriter using that same
RAMDirectory object).  Then we continue writing to that second RAMdirectory
and the on-disk index until the second RAM directory has 1 documents. 
And repeat.

We have NOT run CheckIndex on the RAM directory mainly because this mostly
happens in our production environment which has very high traffic and I'd
have to write some special code to set aside a bad RAMDirectory index, etc
to run the CheckIndex on it.  Just not an easy thing to do.

I can look into enabling asserts.  That may help.

Finally, for the RAM directory, we must create a new IndexWriter everytime
"clean out" the RAM directory after we reach the 1 documents limit.  We
clean out the RAM directory by deleting all the files it contains and we
then create a new IndexWriter using that same RAMDirectory object.  If there
was a reopen for an IndexWriter, we'd use that instead.  For the on-disk
directory, the IndexWriter never changes and we reopen the IndexReader each
time a RAM directory reaches the 1 document limit.

Any further thoughts or ideas?  Thanks again for your help Mike!

Frank

Michael McCandless-2 wrote:
> 
> Not good!  Can you post the ###'s from the exception?  How far out of
> bounds is the access?
> 
> Your usage sounds fine.  Reopen during commit is fine.
> 
> Are you sure the exception comes from the reader on the RAMDir and not
> your main dir?
> 
> How do you periodically move your in-RAM changes to the disk index?
> 
> Have you run CheckIndex on your

Re: 2 directory providers

2010-01-12 Thread anshum.gu...@naukri.com

Hi Sourabh,
If you are talking about using multiple directory implementations, then yes you 
may have multiple of those without any issues.

Sent from BlackBerry® on Airtel

-Original Message-
From: "Mittal, Sourabh" 
Date: Tue, 12 Jan 2010 20:19:11 
To: 
Subject: 2 directory providers

Hi,

Is it possible to have 2 directory providers in a application like RAM as well 
as File Directory?

Regards,
Sourabh

--
NOTICE: If received in error, please destroy, and notify sender. Sender does 
not intend to waive confidentiality or privilege. Use of this email is 
prohibited when received in error. We may monitor and store emails to the 
extent permitted by applicable law.

[JOB] Java/Lucene/Nutch developer in Zurich, Switzerland

2010-01-12 Thread Michael Wechner


Dear Developers

We are looking for Java/Lucene/Nutch developers with over 2-3 years of 
experience for a

project we are currently working on.

The location is Zurich, Switzerland onsite and the job is as employee or 
contractor.


Please reply me privately with your contact details and experience with 
Lucene/Nutch


Thanks

Michael

michael.wech...@wyona.com, +41 44 272 91 61

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

2010-01-12 Thread Paul Taylor



On Tue, Jan 12, 2010 at 7:53 AM, Paul Taylor > wrote:


Lucene in Action says you can possibly use NOT_ANALYSED_NO_NORMS
when indexing fields that arent tokenized, but later says norms
are used to boost fields with less /single term, so matches based
on these single term fields would miss out on this boost. Is there
a way to use NOT_ANALYSED_NO_NORMS on these fields will will mean
they end up with the best boost (1.0 as default) , and then
documents that are analysed with norms receive a negative boost
(<1.0) if they contain more than one token.

I'm not using Document or Field boosting, so seems a bit silly for
me to store all these norms just to say this field contains a
single token and therefore should get an addtional boost.

Perhaps Im misundersanding this, and this would work as required.


thanks Paul

FYI Looking at DefaultSimailarity() the lengthNorm is 1/sqrt(numTerms), 
so for one term would equal 1. i.e. the same as not having norms, so 
AFAIK there is no difference after all if not using Document or field 
boosting


Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Exception invoking MultiPhraseQuery

2010-01-12 Thread Woolf, Ross

I can't invoke MultiPhraseQuery.  It produces the error:
com.sun.jdi.InvocationException occurred invoking method

Here is the code:
MultiPhraseQuery mpq = new MultiPhraseQuery();

In the eclipse debugger when I try to inspect mpq after instantiating it shows 
the error.

I'm on Lucene 2.9.1 with Java 1.5 on Windows XP.  Is MultiPhraseQuery bad in 
2.9.1?  Does anyone know how I can find out why it is having the invocation 
exception?

Thanks


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

SF Bay Area Lucene Meetup Jan. 21st

2010-01-12 Thread Grant Ingersoll

There will be a San Francisco/Bay Area meetup on Jan. 21st at 7:15 PM at the 
"Hacker Dojo" (don't ask me...) location.  

RSVP and all the details are at http://www.meetup.com/SFBay-Lucene-Solr-Meetup/

Hope to see you there,
Grant
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Exception invoking MultiPhraseQuery

2010-01-12 Thread Erick Erickson

I'd try running it outside of Eclipse, and/or checking each and every
of the many configuration options in Eclipse to see if you have an old
jar that Eclipse is using, from jars you've made accessible via the
"java build path" window to projects referenced to..

Alternately, you can look for all the Lucene jars on your machine and
delete (or move) any old ones.

And if none of this helps, can you post the entire stack trace?

HTH
Erick

On Tue, Jan 12, 2010 at 1:28 PM, Woolf, Ross  wrote:

> I can't invoke MultiPhraseQuery.  It produces the error:
> com.sun.jdi.InvocationException occurred invoking method
>
> Here is the code:
> MultiPhraseQuery mpq = new MultiPhraseQuery();
>
> In the eclipse debugger when I try to inspect mpq after instantiating it
> shows the error.
>
> I'm on Lucene 2.9.1 with Java 1.5 on Windows XP.  Is MultiPhraseQuery bad
> in 2.9.1?  Does anyone know how I can find out why it is having the
> invocation exception?
>
> Thanks
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-01-12 Thread Michael McCandless

Is it possible that you're not closing the old IW on the RAMDir before
deleting files / re-using it?  Or, any other possible way that two
open writers could accidentally share the same RAMDir?  Do you
override the LockFactory of the RAMDir?

EG with ConcurrentMergeScheduler, it can continue to write files into
the RAMDir.  It's remotely possible (though I think rather unlikely)
that this could lead to the corruption you're seeing.

If you can turn on setInfoStream for all writers you create and
capture & post all output leading to the exception, that could give us
a clue...

You should be able to use IW's deleteAll method to remove all docs
without closing/reopening the writer (oh -- but this is only available
as of 2.9).

You shouldn't have to remove files yourself -- just open a new IW with
create=true?

Mike

On Tue, Jan 12, 2010 at 11:02 AM, Frank Geary  wrote:
>
> Thanks for the reply Mike.  Your questions were good ones because I realize
> now I should have probably used "Corrupt IndexReader" as the subject for
> this thread.  Here's my answers:
>
> The number stays the same until the corrupted IndexReader is reopened (if
> nothing changes in the IndexReader - and thus we get the same IndexReader
> back from reopen - the problem persists).  Then the next time the problem
> occurs, after we've gotten at least one non-problematic IndexReader and thus
> a few successful searches, the number is different.  For what it's worth my
> latest ### was 176.  My assumption has always been that it is most likely to
> be 1 beyond the end of the BitVector because each commit() is only changing
> the index by adding or deleting one document.  But I don't know for sure.
>
> Yes, I had always expected that reopen during commit would be fine, and my
> semaphore code seems to confirm that (unless adds have something very subtle
> to do with it).  Any other theories would be very much appreciated.
>
> At first I wasn't sure whether it was the RAMDirectory or the main dir.  But
> since I now have many examples where the RAMDirectory IndexReader has
> changed and the main dir IndexReader has not, then I feel that the
> RAMDirectory IndexReader is the problem.  We reopen a new IndexReader every
> time we do a search on a RAMDirectory.  For the main dir, we rarely reopen
> an IndexReader.  We only reopen the main dir IndexReader when a RAM
> Directory is gotten rid of, which happens once the RAM directory indexes
> 1 documents.  But here's a typical example where the main dir
> IndexReader stays the same throughout the problem:
>
> 1) both the RAMDirectory IndexReader and the main dir IndexReader are set
> after the last time we got rid of the RAMDirectory which was full with 1
> documents.
> 2) then we receive about 1077 new documents and add them to the RAMDirectory
> as well as to the main dir indexes (with a number of deletes scattered
> throughout as well).  The RAM directory is commited after every add or
> delete and the main dir is not committed at all.
> 3) then a search comes in, we reopen the RAM Directoy IndexReader, use the
> existing main dir IndexReader (which does not reflect any index changes
> since step 1) and do the search and get ArrayIndexOutOfBounds so the search
> fails (in this case the ### was 208)
> 4) then we simply go one and get another 4000 adds and deletes in the RAM
> directory and main directory.  Again the RAM directory is commited after
> every add or delete and the main dir is not committed at all.
> 5) then the next search comes in, reopens the RAMDirectory IndexReader, uses
> the existing main dir IndexReader (which does not reflect any index changes
> since step 1) and the search works fine!
> 6) during all that time the main dir IndexReader never got reopened, and the
> RAMDirectory IndexReader only got reopened twice - once it was bad and the
> next it was OK.
>
> To answer your question about when we periodically move our in-RAM changes
> to the disk index (my apologies if this is redundant):
> We never move them.  They are duplicated in both the RAM directory and
> on-disk directory as they come in.  Then when the RAM directory determines
> that it has 1 documents, we begin writing to a second RAM Directory,
> reopen and warmup the on-disk IndexReader and once that new on-disk
> IndexReader is ready, we throw away the RAM directory which had 1
> documents (to do this we actually clean out the RAM directory by deleting
> all the files it contains and we create a new IndexWriter using that same
> RAMDirectory object).  Then we continue writing to that second RAMdirectory
> and the on-disk index until the second RAM directory has 1 documents.
> And repeat.
>
> We have NOT run CheckIndex on the RAM directory mainly because this mostly
> happens in our production environment which has very high traffic and I'd
> have to write some special code to set aside a bad RAMDirectory index, etc
> to run the CheckIndex on it.  Just not an easy thing to do.
>
> I can look into

RE: Exception invoking MultiPhraseQuery

2010-01-12 Thread Woolf, Ross

Thanks, I'll try that.  As for the stack trace
"com.sun.jdi.InvocationException occurred invoking method"

is the total of the error I get.  And I only see this when I select "mpq" in 
the Variables window and that is displayed instead of showing the mpq object.  
I've tried catching the exception but it is not catchable.  I only get the 
invocation exception displayed and can't inspect mpq.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, January 12, 2010 11:49 AM
To: java-user@lucene.apache.org
Subject: Re: Exception invoking MultiPhraseQuery

I'd try running it outside of Eclipse, and/or checking each and every
of the many configuration options in Eclipse to see if you have an old
jar that Eclipse is using, from jars you've made accessible via the
"java build path" window to projects referenced to..

Alternately, you can look for all the Lucene jars on your machine and
delete (or move) any old ones.

And if none of this helps, can you post the entire stack trace?

HTH
Erick

On Tue, Jan 12, 2010 at 1:28 PM, Woolf, Ross  wrote:

> I can't invoke MultiPhraseQuery.  It produces the error:
> com.sun.jdi.InvocationException occurred invoking method
>
> Here is the code:
> MultiPhraseQuery mpq = new MultiPhraseQuery();
>
> In the eclipse debugger when I try to inspect mpq after instantiating it
> shows the error.
>
> I'm on Lucene 2.9.1 with Java 1.5 on Windows XP.  Is MultiPhraseQuery bad
> in 2.9.1?  Does anyone know how I can find out why it is having the
> invocation exception?
>
> Thanks
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

NYC Search in the Cloud meetup: Jan 20

2010-01-12 Thread Otis Gospodnetic

Hello,

If "Search Engine Integration, Deployment and Scaling in the Cloud" sounds 
interesting to you, and you are going to be in or near New York next Wednesday 
(Jan 20) evening:

http://www.meetup.com/NYC-Search-and-Discovery/calendar/12238220/

Sorry for dupes to those of you subscribed to multiple @lucene lists.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor

Been doing some analysis with Luke (BTW doesnt work with 
StandardAnalyzer since Version field introduced) and discovered a 
problem with field lenghth boosting for me.


I have a document that represents a recording artist (i.e Madonna, The 
Beatles ectera) it contains an artist and an alias field, the alias 
field contains other names that the artist is maybe known as, and so 
there can be multiple aliases for an artist.


PseudoCode:
(
doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
for (String alias : aliases.get(artistId)) {
 doc.addField(ArtistIndexField.ALIAS, alias);
}
)

Im finding that when I search by for the artist by the alias field if 
the value matches an alias in two different documents the document with 
the least number of aliases get the best score because the boost of the 
alias is split between the aliases on the other doc, if I 
ANALYSED_NO_NORMS then both documents return the same score.


The trouble is I don't want to disable norms because I want a match on a 
single field containing less terms to score better than one with more 
scores.


Full example:

http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1
return two results , the second result only has score of 8 because it 
more aliases than the first result, even the alias it matched on was an 
exact single term match.

http://musicbrainz.org/show/artist/aliases.html?artistid=174327

but if I remove norms then the following query (which is currently working)

http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1

would stop working, in that  searching for 'The beatles' would no longer 
score rate artist 'The Beatles' better than 'The Beatles revival Band'


So isn't there any way to recognise that repeated calls to addField() is 
not creating a single field with many terms,but many fields with few terms.


thanks Paul




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Felipe

You could change the boost of the field artist to be bigger than the field
alias.
field.setBoost(artistBoost);


2010/1/12 Paul Taylor 

> Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer
> since Version field introduced) and discovered a problem with field lenghth
> boosting for me.
>
> I have a document that represents a recording artist (i.e Madonna, The
> Beatles ectera) it contains an artist and an alias field, the alias field
> contains other names that the artist is maybe known as, and so there can be
> multiple aliases for an artist.
>
> PseudoCode:
> (
> doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
> for (String alias : aliases.get(artistId)) {
> doc.addField(ArtistIndexField.ALIAS, alias);
> }
> )
>
> Im finding that when I search by for the artist by the alias field if the
> value matches an alias in two different documents the document with the
> least number of aliases get the best score because the boost of the alias is
> split between the aliases on the other doc, if I ANALYSED_NO_NORMS then both
> documents return the same score.
>
> The trouble is I don't want to disable norms because I want a match on a
> single field containing less terms to score better than one with more
> scores.
>
> Full example:
>
>
> http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1
> return two results , the second result only has score of 8 because it more
> aliases than the first result, even the alias it matched on was an exact
> single term match.
> http://musicbrainz.org/show/artist/aliases.html?artistid=174327
>
> but if I remove norms then the following query (which is currently working)
>
>
> http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1
>
> would stop working, in that  searching for 'The beatles' would no longer
> score rate artist 'The Beatles' better than 'The Beatles revival Band'
>
> So isn't there any way to recognise that repeated calls to addField() is
> not creating a single field with many terms,but many fields with few terms.
>
> thanks Paul
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Felipe Lobo
www.jusbrasil.com.br

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor

Thanks Felipe, but you  are missing the point Artist really doesnt come 
into it, my problem is confined to the alias field, forget about artist 
its just detailed to give the complete scenario


Paul

Felipe wrote:
You could change the boost of the field artist to be bigger than the 
field alias.

field.setBoost(artistBoost);


2010/1/12 Paul Taylor >


Been doing some analysis with Luke (BTW doesnt work with
StandardAnalyzer since Version field introduced) and discovered a
problem with field lenghth boosting for me.

I have a document that represents a recording artist (i.e Madonna,
The Beatles ectera) it contains an artist and an alias field, the
alias field contains other names that the artist is maybe known
as, and so there can be multiple aliases for an artist.

PseudoCode:
(
doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
for (String alias : aliases.get(artistId)) {
doc.addField(ArtistIndexField.ALIAS, alias);
}
)

Im finding that when I search by for the artist by the alias field
if the value matches an alias in two different documents the
document with the least number of aliases get the best score
because the boost of the alias is split between the aliases on the
other doc, if I ANALYSED_NO_NORMS then both documents return the
same score.

The trouble is I don't want to disable norms because I want a
match on a single field containing less terms to score better than
one with more scores.

Full example:


http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1


return two results , the second result only has score of 8 because
it more aliases than the first result, even the alias it matched
on was an exact single term match.
http://musicbrainz.org/show/artist/aliases.html?artistid=174327

but if I remove norms then the following query (which is currently
working)


http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1



would stop working, in that  searching for 'The beatles' would no
longer score rate artist 'The Beatles' better than 'The Beatles
revival Band'

So isn't there any way to recognise that repeated calls to
addField() is not creating a single field with many terms,but many
fields with few terms.

thanks Paul




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-user-h...@lucene.apache.org





--
Felipe Lobo
www.jusbrasil.com.br 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Supported way to get segment from IndexWriter?

2010-01-12 Thread Chris Hostetter



A conversation with someone earlier today got me thinking about cranking 
out a patch for SOLR-1559 (in which the goal is to allow for rules do 
dermine the iput to optimize(maxNumSegments) instead of requiring a fixed 
integer value as input)  when i realized that i wasn't certain what 
"approved" methods there might be for deterrmining hte current number of 
segments from an IndexWriter.


I see IndexWriter.getSegmentCount() but it's package protected (with a 
comment that it exists for tests).  So my best guess using only public 
APIs would be something like...


 int numCurrentSegments = -1;
 IndexReader r = writer.getReader();
 try {
   IndexReader[]tmp = r.getSequentialSubReaders();
   numCurrentSegments = null==tmp ? 1 : tmp.length;
 } finally {
   r.close();
 }

Is there a better way?

(My main concern about this approach being that my intuition (which seems 
supported by the javadocs) is that getReader might be a little 
expensive/excesive just to count the segments)


-Hoss


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

lucene index file randomly crash and need to reindex

2010-01-12 Thread zhang99


how you all deal wich such issue of occasionally need to reindex? what
recommendation do you suggest to minimize this?
-- 
View this message in context: 
http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139147.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

how to follow intranet: configuration in nutch website

2010-01-12 Thread jyzhou817

Hi,

I try to following the instruction from 
http://lucene.apache.org/nutch/tutorial8.html
.
Intranet: Configuration
To configure things for intranet crawling you must:1. Create a directory with a 
flat file of root urls.  For example, to
crawl the nutch site you might start with a file named
urls/nutch containing the url of just the Nutch home
page.  All other Nutch pages should be reachable from this page.  The
urls/nutch file would thus contain:
http://lucene.apache.org/nutch/



not understand. Can anyone help me out. 

Thanks.
zhou


  New Email addresses available on Yahoo!
Get the Email name you've always wanted on the new @ymail and @rocketmail. 
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/

Re: how to follow intranet: configuration in nutch website

2010-01-12 Thread Otis Gospodnetic

Zhou,

Your question will get more attention if you send it to 
nutch-u...@lucene.apache.org list instead.  This list is for Lucene Java.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: "jyzhou...@yahoo.com" 
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:51:59 PM
> Subject: how to follow intranet: configuration in nutch website
> 
> Hi,
> 
> I try to following the instruction from 
> http://lucene.apache.org/nutch/tutorial8.html
> .
> Intranet: Configuration
> To configure things for intranet crawling you must:1. Create a directory with 
> a 
> flat file of root urls.  For example, to
> crawl the nutch site you might start with a file named
> urls/nutch containing the url of just the Nutch home
> page.  All other Nutch pages should be reachable from this page.  The
> urls/nutch file would thus contain:
> http://lucene.apache.org/nutch/
> 
> 
> 
> not understand. Can anyone help me out. 
> 
> Thanks.
> zhou
> 
> 
>   New Email addresses available on Yahoo!
> Get the Email name you've always wanted on the new @ymail and @rocketmail. 
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene index file randomly crash and need to reindex

2010-01-12 Thread Otis Gospodnetic

Hi,

Use the latest version of Lucene, obey Lucene's locks, write with 1 
IndexWriter, avoid NFS...

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: zhang99 
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:41:19 PM
> Subject: lucene index file randomly crash and need to reindex
> 
> 
> how you all deal wich such issue of occasionally need to reindex? what
> recommendation do you suggest to minimize this?
> -- 
> View this message in context: 
> http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139147.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Is it possible to do a PhraseQuery using XML Query Parser?

2010-01-12 Thread syedfa


Dear fellow Java developers:

Is it possible to do a PhraseQuery when using the XML Query Parser?  I
checked the documentation for the XML Query Parser, and it has tags for a
multitude of queries, with PhraseQuery absent from the list.  Is it possible
to do a PhraseQuery using the XMLQueryParser, and if not, is there a way to
implement this functionality?  Just wondering.

Thanks in advance to all who reply.

Sincerely;
Fayyaz


-- 
View this message in context: 
http://old.nabble.com/Is-it-possible-to-do-a-PhraseQuery-using-XML-Query-Parser--tp27139432p27139432.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: how to follow intranet: configuration in nutch website

2010-01-12 Thread jyzhou817

Thanks.

--- On Wed, 13/1/10, Otis Gospodnetic  wrote:

From: Otis Gospodnetic 
Subject: Re: how to follow intranet: configuration in nutch website
To: java-user@lucene.apache.org
Date: Wednesday, 13 January, 2010, 12:07 PM

Zhou,

Your question will get more attention if you send it to 
nutch-u...@lucene.apache.org list instead.  This list is for Lucene Java.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: "jyzhou...@yahoo.com" 
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:51:59 PM
> Subject: how to follow intranet: configuration in nutch website
> 
> Hi,
> 
> I try to following the instruction from 
> http://lucene.apache.org/nutch/tutorial8.html
> .
> Intranet: Configuration
> To configure things for intranet crawling you must:1. Create a directory with 
> a 
> flat file of root urls.  For example, to
> crawl the nutch site you might start with a file named
> urls/nutch containing the url of just the Nutch home
> page.  All other Nutch pages should be reachable from this page.  The
> urls/nutch file would thus contain:
> http://lucene.apache.org/nutch/
> 
> 
> 
> not understand. Can anyone help me out. 
> 
> Thanks.
> zhou
> 
> 
>       New Email addresses available on Yahoo!
> Get the Email name you've always wanted on the new @ymail and @rocketmail. 
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene index file randomly crash and need to reindex

2010-01-12 Thread zhang99


what is the longest time you ever keep index file without required to
reindex. i notice even big open source life liferay suffer from this. 
thanks for the tips
-- 
View this message in context: 
http://old.nabble.com/lucene-index-file-randomly-crash-and-need-to-reindex-tp27139147p27139613.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

NOT_ANALYSED_NO_NORMS should get max field length boost

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

Re: Lucene computes an automatic boost based on the number of tokens in the field (shorter fields have a higher boost) ?

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

Re: Implementing filtering based on multiple fields

2 directory providers

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

Re: 2 directory providers

[JOB] Java/Lucene/Nutch developer in Zurich, Switzerland

Re: NOT_ANALYSED_NO_NORMS should get max field length boost

Exception invoking MultiPhraseQuery

SF Bay Area Lucene Meetup Jan. 21st

Re: Exception invoking MultiPhraseQuery

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

RE: Exception invoking MultiPhraseQuery

NYC Search in the Cloud meetup: Jan 20

Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Supported way to get segment from IndexWriter?

lucene index file randomly crash and need to reindex

how to follow intranet: configuration in nutch website

Re: how to follow intranet: configuration in nutch website

Re: lucene index file randomly crash and need to reindex

Is it possible to do a PhraseQuery using XML Query Parser?

Re: how to follow intranet: configuration in nutch website

Re: lucene index file randomly crash and need to reindex

31 matches

Site Navigation

Mail list logo

Footer information