date:20060928

Re: Very high fieldNorm for a field resulting in bad results

2006-09-28 Thread Mek

it depends on your goal. index time field boosts are a way to express things like "this documents title is worth twice as much as the title of most documents" query time boosts are a way to express "i care about matches on this clause of my query twice as much as i do about matches to other claus

Re[4]: how to enhance speed of sorted search

2006-09-28 Thread Yura Smolsky

Hello, Chris. CH> : I am thinking should be this faster CH> The ConstantScoreQuery wrapped arround the QueryFilter might in fact be CH> faster then the raw query -- have your tried it to see? I will try when PrefixFilter will be added in Lucene. b/c I have some technical issues and I use PyLucene

Re: Splitting the index

2006-09-28 Thread Rob Young

On Wednesday 27 September 2006 18:51, Erik Hatcher wrote: > Lots of possible issues, but we need more information to troubleshoot > this properly. > How big is your index, number of documents? CDs 137,390 DVDs 41,049 Games 3,360 Books 648,941 Total 830,740 > total fi

Indexing a single product in multiple categories.

2006-09-28 Thread Stuart Grimshaw

We have an existing lucene based search, and a recent change to the way we organise our products has caused a bit of a problem for search results. Our products are arranged into subcategories, categories & stores. A product can only be in 1 subcat or cat, but a cat can be in multiple stores. We

'categorized-term' web index

2006-09-28 Thread Vladimir Olenin

Hi, I wonder if anyone knows. - is there a place I can get already crawled internet web pages in an archive (10 - 100Gb of data) - is there a place I can get already created Lucene index for these pages - is there such thing as 'categorized-terms' index, meaning each page is processed by an

Indexing large index with Lucene

2006-09-28 Thread Eric Louvard

I'm using Lucene since several year. We had to index allways more documents. I'm now trying to optimise the index process with more than 1.000.000 documents and I can see that the performance will decrease when the index size is greater. I would like to know if someone as allready studied this

Re: 'categorized-term' web index

2006-09-28 Thread Steven Rowe

Vladimir Olenin wrote: > - is there a place I can get already crawled internet web pages in an > archive (10 - 100Gb of data) I don't the sizes of the corpora mentioned on Lucene Wiki's Resources page, but it's a good place to start:

Re: Indexing large index with Lucene

2006-09-28 Thread Erick Erickson

Two things come to mind... First, you can freely write to an index while searching it, the search is always available. I'm pretty sure this includes deleting/readding documents. However, you won't be able to search on the changes in your index until you close/reopen the *searcher*. Second, depen

Re: Field boosting in MemoryIndex

2006-09-28 Thread Wolfgang Hoschek

Hi, I am playing with MemoryIndex for a situation in which I have a large number of small, ephemeral documents that I need to fire queries at. It appears to be at least 5x faster than RAMDirectory for my usage, which is large enough to be interesting. However MemoryIndex does not seem to sup

Re: highlight using a MemoryIndex

2006-09-28 Thread Wolfgang Hoschek

Document.get(FIELD_NAME) will always be null because MemoryIndex does not store the original untokenized fulltext(s). Those full texts are thrown away immediately after tokenization (i.e. on addField()), keeping only the field names and associated (indexed) tokenized terms. The latter a

Re: Re[2]: strange behavior 4 query term boost

2006-09-28 Thread Mike Klaas

On 9/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Found the reason, it is a bug IMHO. The example should be: A: term1^5 term2^6 term3^7 B: term1^5E-4 term2^6E-4 term3^7E-4 C: term1^0.0006 term2^0.0006 term3^0.0007 A & C suppose return the same rank B is different Since B will be parsed

MoreLikeThis does not retrieve all terms when using like()

2006-09-28 Thread Hadas Cohen

Ever since I started using Lucene, I found all answers to all possible questions in the archive. But I need help about those ones. 1. I am using MoreLikeThis class, and cannot figure out why not all terms are retrieved when using like() to generate queries. I extract the terms from a documen

NPE thrown in invertDocument

2006-09-28 Thread Ryan Heinen

Hello, I am creating an index using a RAMDirectory, and am running across a situation where when I call IndexSearcher.addDocument it throws a NullPointerException. I'll provide the stack trace first, and then give you any details that may help in resolving this problem. Exception in thread

Re: Indexing large index with Lucene

2006-09-28 Thread Chris Lu

I like the approach in your second point. But I have doubt on the first point. For a production level index, usually pretty big, freqent close/reopen the searcher may not be fast enough, especially when you want to cache sorting. It's better to keep the searchers open. But when the indexing proce

Problem searching

2006-09-28 Thread James O'Rourke

I have weird results when I create documents with input such as burmilla2.jpg. Here is some example code: Document doc = new Document() doc.add(new Field("combined", "jorourke" + " " + "burmilla2.jpg" , Field.Store.YES, Field.Index.TOKENIZED)) indexWriter.addDocument(doc) QueryParser pars

Re: Indexing large index with Lucene

2006-09-28 Thread Erick Erickson

I'll gladly defer to your experience on this one . I have to admit that my applications don't care much about updating so I'm not very familiar with the behavior there. Erick On 9/28/06, Chris Lu <[EMAIL PROTECTED]> wrote: I like the approach in your second point. But I have doubt on the first

Re: Problem searching

2006-09-28 Thread Erick Erickson

Sure, what analyzer are you using? Both for indexing and parsing the query? I'll bet that you're using a tokenizer (in one or both) that breaks your input stream up in ways you're not expecting. Get a copy of luke (google lucene luke) if you haven't already and look at what's in your index. You c

Re: Problem searching

2006-09-28 Thread James O'Rourke

Wow, thanks, just found a significant bug - thought I was using StandardAnalyzer throughout but was not!! ouch! Nasty typo. Thanks again. James On Sep 28, 2006, at 5:57 PM, Erick Erickson wrote: Sure, what analyzer are you using? Both for indexing and parsing the query? I'll bet that you

Re: Very high fieldNorm for a field resulting in bad results

Re[4]: how to enhance speed of sorted search

Re: Splitting the index

Indexing a single product in multiple categories.

'categorized-term' web index

Indexing large index with Lucene

Re: 'categorized-term' web index

Re: Indexing large index with Lucene

Re: Field boosting in MemoryIndex

Re: highlight using a MemoryIndex

Re: Re[2]: strange behavior 4 query term boost

MoreLikeThis does not retrieve all terms when using like()

NPE thrown in invertDocument

Re: Indexing large index with Lucene

Problem searching

Re: Indexing large index with Lucene

Re: Problem searching

Re: Problem searching

18 matches

Site Navigation

Mail list logo

Footer information