Re: About the search efficiency based on document's length

2007-09-20 Thread Karl Wettin
21 sep 2007 kl. 08.23 skrev Jarvis: There is a question about the document’s length and search efficiency. Two ways to index some html pages(ignore some information): one is both store and index the html content in lucene dictionary, the other is just index the content . For the first met

About the search efficiency based on document's length

2007-09-20 Thread Jarvis
Hi everyone, There is a question about the document’s length and search efficiency. Think of this situation: Two ways to index some html pages(ignore some information): one is both store and index the html content in lucene dictionary, the other is just index the content . For the first method i

Re: Multiple Indices vs Single Index

2007-09-20 Thread Nikhil Chhaochharia
Thanks Grant and Chris for the replies. I am looking at a single index because the 40 index system has started having performance issues at high load. My daily traffic is increasing at a steady pace and about 40% of the traffic is concentrated in a 2 hour period and searches start slowing down

Re: Question regarding proximity search

2007-09-20 Thread Chris Hostetter
: I checked the lucene converted syntax (using Query.toString()) in both case : and found the second one actually not converting to proximity query. I don't think you understood what I was trying to say... using parens with a "~" character after it is not currently, and has never been (to my kn

Re: Question regarding proximity search

2007-09-20 Thread Sonu SR
Thanks Hoss, for the reply. I am using Lucene 2.1. I checked the lucene converted syntax (using Query.toString()) in both case and found the second one actually not converting to proximity query. "cat dog"~6 is converted to ABST:"cat dog"~4 and (cat dog)~6 is converted to +ABST:cat +ABST:dog. Tha

Re: Multiple Indices vs Single Index

2007-09-20 Thread Chris Hostetter
: I was wondering if it will be better to just have 1 large index with all : the 40 indices combined. I do not need to do dual-queries and my total : index size (if I create a single index) is about 3.4GB. It will : increase to maximum of 5-6 GB. I am running this on a dedicated machine : w

Re: Question regarding proximity search

2007-09-20 Thread Chris Hostetter
: Is the query "cat dog"~6 same as (cat dog)~6 ? : I think both case will search for "cat" and "dog" within 6 words each other. : But I am getting different number of results for the above queries. The : second one may be the higher. Please clarify this. i don't believe:(cat dog)~6 is eve

Re: highlighting and fragments

2007-09-20 Thread Mark Miller
Lucene's storing functionality is just a simple storage mechanism. You can certainly and easily use your own storage mechanism. When you get your user created id back from Lucene due to a hit, just pass that id to your storage system to get the original text and then feed that to the Highlighte

highlighting and fragments

2007-09-20 Thread Michael J. Prichard
Hello Folks, I wanted to stay away from storing text in the indexes in order to keep them smaller. I have a requirement now though to provide highlighting and, more so, fragments of the content so they will be displayed on the UI. Do you all prefer to store the text in the index to make this

Re: Multiple Indices vs Single Index

2007-09-20 Thread Grant Ingersoll
If the current version is working well, what is the reason to move? Is it just to make management of the indices easier? On Sep 20, 2007, at 12:07 PM, Nikhil Chhaochharia wrote: OK, thanks. I actually have both systems implemented. The multi-index one is being used currently and it works

Re: Multiple Indices vs Single Index

2007-09-20 Thread Nikhil Chhaochharia
OK, thanks. I actually have both systems implemented. The multi-index one is being used currently and it works well. I have deployed the single index solution a few times during off-peak hours and the response time has been almost the same as the multi-index solution. I tried to simulate some

Re: thread safe shared IndexSearcher

2007-09-20 Thread Jay Yu
Mark Miller wrote: Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is sync Readers with Writers and allow multiple threads to share the same instances of them -- nothing more. The code just forces Readers to refresh when Writers are used to change the index. There really

Re: thread safe shared IndexSearcher

2007-09-20 Thread Mark Miller
Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is sync Readers with Writers and allow multiple threads to share the same instances of them -- nothing more. The code just forces Readers to refresh when Writers are used to change the index. There really isn't any functional

Re: Multiple Indices vs Single Index

2007-09-20 Thread Grant Ingersoll
OK, I thought you meant your index would have in it the name of the second index and would thus do a two-stage retrieval. At any rate, if you are saying your combined index with all the stored fields is ~3.4 GB I would think it would fit reasonably on the machine you have and perform reason

Re: thread safe shared IndexSearcher

2007-09-20 Thread Jay Yu
Mark, Thanks for sharing your valuable exp. and thoughts. Frankly our system already has most of the functionalities LuceneIndexAcessor offers. The only thing I am looking for is to sync the searchers' close. That's why I am little worried about the way accessor handles the searcher sync. I w

Re: Multiple Indices vs Single Index

2007-09-20 Thread Nikhil Chhaochharia
I am sorry, it seems that I was not clear with what my problem is. I will try to describe it again. My data is divided into 40 categories and at one time only one category can be searched. The GUI for the system will ask the user to select the category from a drop-down. Currently, I have a s

Re: Multiple Indices vs Single Index

2007-09-20 Thread Grant Ingersoll
If I understand correctly, you want to do a two stage retrieval right? That is, look up in the initial index (3.4 GB) and then do a second search on the sub index? Presumably, you have to manage the Searchers, etc. for each of the sub-indexes as well as the big index. This means you have

Re: Oracle-Lucene integration (OJVMDirectory and Lucene Domain Index) - LONG

2007-09-20 Thread Marcelo Ochoa
Hi Chris: First sorry for the delay :( I have some preliminary performance test using Oracle 11g running on in a VMWare virtual Machine with 400Mb SGA (Virtual Machine using 812Mb RAM for Oracle Enterprise Linux 4.0). This virtual machine is hosted in a modest hardware, a Pentium IV 2.18Ghz wit

Question regarding proximity search

2007-09-20 Thread Sonu SR
Hi, I have a doubt on proximity search. Is the query "cat dog"~6 same as (cat dog)~6 ? I think both case will search for "cat" and "dog" within 6 words each other. But I am getting different number of results for the above queries. The second one may be the higher. Please clarify this. Thanks, Son

Lucene multiple indexes

2007-09-20 Thread Dino Korah
Hi People, I was trying to get lucene to work for a mail indexing solution. Scenario: Traffic into the index method is on average 250 mails and their attachments per minute. This volume has made me think of a solution that will split the index on domain names of the owner of the message. S

Multiple Indices vs Single Index

2007-09-20 Thread Nikhil Chhaochharia
Hi, I have about 40 indices which range in size from 10MB to 700MB. There are quite a few stored fields. To get an idea of the document size, I have about 400k documents in the 700MB index. Depending on the query, I choose the index which needs to be searched. Each query hits only one index

Re: a query for a special AND?

2007-09-20 Thread Paul Elschot
On Thursday 20 September 2007 09:19, Mohammad Norouzi wrote: > well, you mean we should separate documents just like relational tables in > databases ? Quite the contrary, it's called _de_normalization. This means that the documents in lucene normally contain more information than is present in a

Re: a query for a special AND?

2007-09-20 Thread Mohammad Norouzi
well, you mean we should separate documents just like relational tables in databases ? if yes, how to make the relationship between those documents thank you so much Paul On 9/20/07, Paul Elschot <[EMAIL PROTECTED]> wrote: > > On Thursday 20 September 2007 07:29, Mohammad Norouzi wrote: > > Sorry

Re: a query for a special AND?

2007-09-20 Thread Paul Elschot
On Thursday 20 September 2007 07:29, Mohammad Norouzi wrote: > Sorry Paul I just hurried in replying ;) > I read the documents of Lucene about query syntax and I figured out the what > is the difference > but my problem is different, this is preoccupied my mind and I am under > pressure to solve th