21 sep 2007 kl. 08.23 skrev Jarvis:
There is a question about the document’s length and search efficiency.
Two ways to index some html pages(ignore some information): one is
both
store and index the html content in lucene dictionary, the other is
just
index the content . For the first met
Hi everyone,
There is a question about the document’s length and search efficiency.
Think of this situation:
Two ways to index some html pages(ignore some information): one is both
store and index the html content in lucene dictionary, the other is just
index the content . For the first method i
Thanks Grant and Chris for the replies.
I am looking at a single index because the 40 index system has started having
performance issues at high load. My daily traffic is increasing at a steady
pace and about 40% of the traffic is concentrated in a 2 hour period and
searches start slowing down
: I checked the lucene converted syntax (using Query.toString()) in both case
: and found the second one actually not converting to proximity query.
I don't think you understood what I was trying to say...
using parens with a "~" character after it is not currently, and has never
been (to my kn
Thanks Hoss, for the reply. I am using Lucene 2.1.
I checked the lucene converted syntax (using Query.toString()) in both case
and found the second one actually not converting to proximity query.
"cat dog"~6 is converted to ABST:"cat dog"~4 and
(cat dog)~6 is converted to +ABST:cat +ABST:dog.
Tha
: I was wondering if it will be better to just have 1 large index with all
: the 40 indices combined. I do not need to do dual-queries and my total
: index size (if I create a single index) is about 3.4GB. It will
: increase to maximum of 5-6 GB. I am running this on a dedicated machine
: w
: Is the query "cat dog"~6 same as (cat dog)~6 ?
: I think both case will search for "cat" and "dog" within 6 words each other.
: But I am getting different number of results for the above queries. The
: second one may be the higher. Please clarify this.
i don't believe:(cat dog)~6 is eve
Lucene's storing functionality is just a simple storage mechanism. You
can certainly and easily use your own storage mechanism. When you get
your user created id back from Lucene due to a hit, just pass that id to
your storage system to get the original text and then feed that to the
Highlighte
Hello Folks,
I wanted to stay away from storing text in the indexes in order to keep
them smaller. I have a requirement now though to provide highlighting
and, more so, fragments of the content so they will be displayed on the UI.
Do you all prefer to store the text in the index to make this
If the current version is working well, what is the reason to move?
Is it just to make management of the indices easier?
On Sep 20, 2007, at 12:07 PM, Nikhil Chhaochharia wrote:
OK, thanks.
I actually have both systems implemented. The multi-index one is
being used currently and it works
OK, thanks.
I actually have both systems implemented. The multi-index one is being used
currently and it works well. I have deployed the single index solution a few
times during off-peak hours and the response time has been almost the same as
the multi-index solution. I tried to simulate some
Mark Miller wrote:
Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is
sync Readers with Writers and allow multiple threads to share the same
instances of them -- nothing more. The code just forces Readers to
refresh when Writers are used to change the index. There really
Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor does is
sync Readers with Writers and allow multiple threads to share the same
instances of them -- nothing more. The code just forces Readers to
refresh when Writers are used to change the index. There really isn't
any functional
OK, I thought you meant your index would have in it the name of the
second index and would thus do a two-stage retrieval.
At any rate, if you are saying your combined index with all the
stored fields is ~3.4 GB I would think it would fit reasonably on the
machine you have and perform reason
Mark,
Thanks for sharing your valuable exp. and thoughts.
Frankly our system already has most of the functionalities
LuceneIndexAcessor offers. The only thing I am looking for is to sync
the searchers' close. That's why I am little worried about the way
accessor handles the searcher sync.
I w
I am sorry, it seems that I was not clear with what my problem is. I will try
to describe it again.
My data is divided into 40 categories and at one time only one category can be
searched. The GUI for the system will ask the user to select the category from
a drop-down. Currently, I have a s
If I understand correctly, you want to do a two stage retrieval
right? That is, look up in the initial index (3.4 GB) and then do a
second search on the sub index? Presumably, you have to manage the
Searchers, etc. for each of the sub-indexes as well as the big
index. This means you have
Hi Chris:
First sorry for the delay :(
I have some preliminary performance test using Oracle 11g running on
in a VMWare virtual Machine with 400Mb SGA (Virtual Machine using
812Mb RAM for Oracle Enterprise Linux 4.0). This virtual machine is
hosted in a modest hardware, a Pentium IV 2.18Ghz wit
Hi,
I have a doubt on proximity search.
Is the query "cat dog"~6 same as (cat dog)~6 ?
I think both case will search for "cat" and "dog" within 6 words each other.
But I am getting different number of results for the above queries. The
second one may be the higher. Please clarify this.
Thanks,
Son
Hi People,
I was trying to get lucene to work for a mail indexing solution.
Scenario:
Traffic into the index method is on average 250 mails and their attachments
per minute. This volume has made me think of a solution that will split the
index on domain names of the owner of the message. S
Hi,
I have about 40 indices which range in size from 10MB to 700MB. There are
quite a few stored fields. To get an idea of the document size, I have about
400k documents in the 700MB index.
Depending on the query, I choose the index which needs to be searched. Each
query hits only one index
On Thursday 20 September 2007 09:19, Mohammad Norouzi wrote:
> well, you mean we should separate documents just like relational tables in
> databases ?
Quite the contrary, it's called _de_normalization. This means that the
documents in lucene normally contain more information than is present
in a
well, you mean we should separate documents just like relational tables in
databases ?
if yes, how to make the relationship between those documents
thank you so much Paul
On 9/20/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>
> On Thursday 20 September 2007 07:29, Mohammad Norouzi wrote:
> > Sorry
On Thursday 20 September 2007 07:29, Mohammad Norouzi wrote:
> Sorry Paul I just hurried in replying ;)
> I read the documents of Lucene about query syntax and I figured out the what
> is the difference
> but my problem is different, this is preoccupied my mind and I am under
> pressure to solve th
24 matches
Mail list logo