Re: Very high fieldNorm for a field resulting in bad results

2006-09-28 Thread Mek
it depends on your goal. index time field boosts are a way to express things like "this documents title is worth twice as much as the title of most documents" query time boosts are a way to express "i care about matches on this clause of my query twice as much as i do about matches to other claus

Re[4]: how to enhance speed of sorted search

2006-09-28 Thread Yura Smolsky
Hello, Chris. CH> : I am thinking should be this faster CH> The ConstantScoreQuery wrapped arround the QueryFilter might in fact be CH> faster then the raw query -- have your tried it to see? I will try when PrefixFilter will be added in Lucene. b/c I have some technical issues and I use PyLucene

Re: Splitting the index

2006-09-28 Thread Rob Young
On Wednesday 27 September 2006 18:51, Erik Hatcher wrote: > Lots of possible issues, but we need more information to troubleshoot > this properly. > How big is your index, number of documents? CDs 137,390 DVDs 41,049 Games 3,360 Books 648,941 Total 830,740 > total fi

Indexing a single product in multiple categories.

2006-09-28 Thread Stuart Grimshaw
We have an existing lucene based search, and a recent change to the way we organise our products has caused a bit of a problem for search results. Our products are arranged into subcategories, categories & stores. A product can only be in 1 subcat or cat, but a cat can be in multiple stores. We

'categorized-term' web index

2006-09-28 Thread Vladimir Olenin
Hi, I wonder if anyone knows. - is there a place I can get already crawled internet web pages in an archive (10 - 100Gb of data) - is there a place I can get already created Lucene index for these pages - is there such thing as 'categorized-terms' index, meaning each page is processed by an

Indexing large index with Lucene

2006-09-28 Thread Eric Louvard
I'm using Lucene since several year. We had to index allways more documents. I'm now trying to optimise the index process with more than 1.000.000 documents and I can see that the performance will decrease when the index size is greater. I would like to know if someone as allready studied this

Re: 'categorized-term' web index

2006-09-28 Thread Steven Rowe
Vladimir Olenin wrote: > - is there a place I can get already crawled internet web pages in an > archive (10 - 100Gb of data) I don't the sizes of the corpora mentioned on Lucene Wiki's Resources page, but it's a good place to start:

Re: Indexing large index with Lucene

2006-09-28 Thread Erick Erickson
Two things come to mind... First, you can freely write to an index while searching it, the search is always available. I'm pretty sure this includes deleting/readding documents. However, you won't be able to search on the changes in your index until you close/reopen the *searcher*. Second, depen

Re: Field boosting in MemoryIndex

2006-09-28 Thread Wolfgang Hoschek
Hi, I am playing with MemoryIndex for a situation in which I have a large number of small, ephemeral documents that I need to fire queries at. It appears to be at least 5x faster than RAMDirectory for my usage, which is large enough to be interesting. However MemoryIndex does not seem to sup

Re: highlight using a MemoryIndex

2006-09-28 Thread Wolfgang Hoschek
Document.get(FIELD_NAME) will always be null because MemoryIndex does not store the original untokenized fulltext(s). Those full texts are thrown away immediately after tokenization (i.e. on addField()), keeping only the field names and associated (indexed) tokenized terms. The latter a

Re: Re[2]: strange behavior 4 query term boost

2006-09-28 Thread Mike Klaas
On 9/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Found the reason, it is a bug IMHO. The example should be: A: term1^5 term2^6 term3^7 B: term1^5E-4 term2^6E-4 term3^7E-4 C: term1^0.0006 term2^0.0006 term3^0.0007 A & C suppose return the same rank B is different Since B will be parsed

MoreLikeThis does not retrieve all terms when using like()

2006-09-28 Thread Hadas Cohen
Ever since I started using Lucene, I found all answers to all possible questions in the archive. But I need help about those ones. 1. I am using MoreLikeThis class, and cannot figure out why not all terms are retrieved when using like() to generate queries. I extract the terms from a documen

NPE thrown in invertDocument

2006-09-28 Thread Ryan Heinen
Hello, I am creating an index using a RAMDirectory, and am running across a situation where when I call IndexSearcher.addDocument it throws a NullPointerException. I'll provide the stack trace first, and then give you any details that may help in resolving this problem. Exception in thread

Re: Indexing large index with Lucene

2006-09-28 Thread Chris Lu
I like the approach in your second point. But I have doubt on the first point. For a production level index, usually pretty big, freqent close/reopen the searcher may not be fast enough, especially when you want to cache sorting. It's better to keep the searchers open. But when the indexing proce

Problem searching

2006-09-28 Thread James O'Rourke
I have weird results when I create documents with input such as burmilla2.jpg. Here is some example code: Document doc = new Document() doc.add(new Field("combined", "jorourke" + " " + "burmilla2.jpg" , Field.Store.YES, Field.Index.TOKENIZED)) indexWriter.addDocument(doc) QueryParser pars

Re: Indexing large index with Lucene

2006-09-28 Thread Erick Erickson
I'll gladly defer to your experience on this one . I have to admit that my applications don't care much about updating so I'm not very familiar with the behavior there. Erick On 9/28/06, Chris Lu <[EMAIL PROTECTED]> wrote: I like the approach in your second point. But I have doubt on the first

Re: Problem searching

2006-09-28 Thread Erick Erickson
Sure, what analyzer are you using? Both for indexing and parsing the query? I'll bet that you're using a tokenizer (in one or both) that breaks your input stream up in ways you're not expecting. Get a copy of luke (google lucene luke) if you haven't already and look at what's in your index. You c

Re: Problem searching

2006-09-28 Thread James O'Rourke
Wow, thanks, just found a significant bug - thought I was using StandardAnalyzer throughout but was not!! ouch! Nasty typo. Thanks again. James On Sep 28, 2006, at 5:57 PM, Erick Erickson wrote: Sure, what analyzer are you using? Both for indexing and parsing the query? I'll bet that you