RE: Storing large text or binary source documents in the index and memory usage

2006-01-20 Thread Chris Hostetter
: otherwise I would have done so already. My real question is question number : one, which did not receive a reply, is there a formula that can tell me if : what is happening is reasonable and to be expected, or am I doing something I've never played with the binary fields much, nor have i ever t

RE: Storing large text or binary source documents in the index and memory usage

2006-01-20 Thread George Washington
Thank you ipowers for your reply. Perhaps I did not make myself clear enough. As I explained in my original posting I want to store large documents in the Lucene index. Storing them elsewhere is not an option, otherwise I would have done so already. My real question is question number one, whi

Re: Beginner Querying multiple fields pls help

2006-01-20 Thread Chris Hostetter
1) instead of building up a string, and giving it to QueryParser, i would suggest you look at the MultiFieldQueryParser 2) it sounds like what you are interested regarding the category field, is not to restrict your search to a particular handfull of categoryIds, but to a large set of categories

Re: Document similarity

2006-01-20 Thread Aleksey Serba
Yonik, Klaus, thanks for your quick response. Let me rephrase, i can't compare currently processed document with all documents in my collection using angle between documents in terms-vector space because of performance issues. As far as i can see, i can avoid unnecessary operations. At first, i ca

RE: Storing large text or binary source documents in the index and memory usage

2006-01-20 Thread John Powers
Are these super large files supposed to be searchable? Can the binary files be stored somewhere else and just pointed to? Can the text files be broken up? -Original Message- From: George Washington [mailto:[EMAIL PROTECTED] Sent: Thursday, January 19, 2006 10:52 PM To: java-user@lucene.a

Re: Document similarity

2006-01-20 Thread Yonik Seeley
If you didn't want to store term vectors you could also run the document fields through the analyzer yourself and collect the Tokens (you should still have the fields you just indexed... no need to retrieve it again). -Yonik On 1/20/06, Klaus <[EMAIL PROTECTED]> wrote: > > >In my case, i need to

RE: Beginner Querying multiple fields pls help

2006-01-20 Thread Ashley Rajaratnam
Hi Joshi Thanks for the reply! I had already done that before but failed to put it in the code in the original post if (BooleanQuery.GetMaxClauseCount() < MAX_CLAUSE_COUNT) BooleanQuery.SetMaxClauseCount(MAX_CLAUSE_COUNT); Im using Lucene 1.9 that fixes the probl

AW: Document similarity

2006-01-20 Thread Klaus
>In my case, i need to filter similar documents in search results and >therefore determine document similarity during indexing process using >term vectors. Obviously, i can't compare currently indexing document >with all documents in my collection. Yes you can. Right after indexing the new docum

Re: Beginner Querying multiple fields pls help

2006-01-20 Thread Hemant Joshi
You have to set bq.setMaxClauseCount value as the default number of clauses BooleanQuery supports is 1024. I am guessing you have categoryIDs between 1-3 which means more than 1024 clauses. -Hemant setMaxClauseCount Ashley Rajaratnam wrote: Hi, Please forgive me if this comes a

Beginner Querying multiple fields pls help

2006-01-20 Thread Ashley Rajaratnam
Hi, Please forgive me if this comes across as being naïve however Ive bashed my head against it for a while and can’t come up with a solution. Overview: I have the following basic document structure: … Document doc = new Document(); doc.Add(Field.Text("itemtitle", iteminf.itemtitle))

Document similarity

2006-01-20 Thread Aleksey Serba
Hello lucene people! First of all, i would like to thank all of community participants ( developers, users, Erik and Otis for "Lucene in Action" book ) for their great work. As far as i understand it, there are two most popular approches concerning document similarity: 1. "cosine metrics" using te

: Creating searcher object for read opertions:

2006-01-20 Thread Ravi
Hi , I want to create searcher object for only read operation. I read that we can open any number of read only connections and we can work with them . But when they can be closed ,if we continually opens ,is there any problem with that.. I want to use single searcher object which

AW: Use the lucene for searching in the Semantic Web.

2006-01-20 Thread Klaus
>The feature vector may be bigger than the object-predicate pairs. In my >application, each document may be annotated with several concepts to say >this document contains an instance of a class. How do you do that? I have to reengineer the ontology in my application, but I'm not sure how to express