J.J. Larrea wrote:
> I concur with your thoughts that there is room for such
> utility classes, and that those would increase the use of
> programmatic queries. I say this as a developer who also
> "lazed out" and opted to simply construct a string and let
> the QP do all the work (but who t
Erick Erickson wrote:
...
> It seems to me that you can always do something like:
> BooleanQuery bq;
> QueryParser qp1 = new QueryParser("field1", "", analyzer);
> Query q1 = qp1.parse("search term or clause); bq.add(q1,,,);
> QueryParser qp2 = new QueryParser("field2", "", analyzer);
> Query q
Hi,
I'm trying to run 20mil+ queries against an index containing 2mil
documents, and it has been quite slow. I've been reading about
MemoryIndex, but it is only a single-document index. As I have quite a
bit of RAM (~20gb), is there a way I could store the index in RAM or
any other way that makes
Hi,
As I use SpanQuery purely for the use of slop, I was wondering how to
make SpanQuery more efficient,. Since I don't need any span
information, is there a way to disable the computation for span and
other unneeded overhead?
Thanks.
Michael
---
Hi Otis,
Thanks for that. I found out that it's a memory usage problem rather
than one on Lucene's part.
Thanks.
Michael
On 5/22/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi Michael,
I don't see any responses to your problem. It's early, so you may get some,
but this sounds like a cas
On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote:
> I'm indexing ~1 documents per day but since I'm getting a lot of
real duplicates (100% the same document content) I want to check the
content before indexing...
> My idea is to create a checksum of the documents content an
OK, got it. Thanks.
On 5/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On May 21, 2006, at 10:56 PM, Zhenjian YU wrote:
> I didn't dig the source code of lucence deep enough, but I noticed
> that the
> IndexSearcher uses an IndexReader, while the cost of initializing
> IndexReader is a bit hi
I have created a method that can delete duplicate docs. Basically,
during indexing, a doc is associated with an id (a term field defined by
you.) that is indexed. Then, call the method to delete duplicates
whenever you update index.
I haven't contributed back to Lucene community yet because our
you have two choices that I can think of:
1- before adding a document, check if it does't exist in the index. you can do
this by querying on a unique field if you have it .
2- you can index all your documents, and once the indexing is done you can
dedupe. (Lucene has built in methods that can hel
Marcus Falck wrote:
There is however one LARGE problem that we have run into. All search result should be displayed sorted with the newest document at top. We tried to accomplish this using Lucene's sort capabilites but quickly ran into large performance bottlenecks. So i figured since the default
Tom Emerson wrote:
Thanks for the clarification. What then is the difference between a
MultiSearcher and using an IndexSearcher on a MultiReader?
The results should be identical. A MultiSearcher permits use of
ParallelMultiSearcher and RemoteSearchable, for parallel and/or
distributed operat
On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote:
>
> I'm indexing ~1 documents per day but since I'm getting a lot of
> real duplicates (100% the same document content) I want to check the
> content before indexing...
>
> My idea is to create a checksum of the documents content a
: Not quite. The user is presented with a list of (UI) fields, and each
: field already knows whether its an "OR" "AND" etc.
: So, there is no query String as such.
: For this reason, it seems to make more sense to build the query up
: programmatically - as my field meta data can drive this.
: How
Hi All,
I'm indexing ~1 documents per day but since I'm getting a lot of
real duplicates (100% the same document content) I want to check the
content before indexing...
My idea is to create a checksum of the documents content and store it
within document inside the index, before indexing
There's a long scree that I'm leaving at the bottom because I put effort
into it and I like to rant. But here's, perhaps, an approach.
Maybe I'm mis-interpreting what you're trying to do. I'm assuming that you
have several search fields (I'm not exactly sure what "driven by meta-data"
means in th
I'm pretty new to lucene and was wondering if there are any resources on
how to do incremental updates in lucene.
Thanks!
Van Nguyen
Wynne Systems, Inc.
19800 MacArthur Blvd., Suite 900
Irvine, CA 92612-2421
949.224.6300 ext 223
949.225.6540 (fax)
866.901.9284 (toll-free)
www.wynnesystems.com
: Can anyone clarify this behavior, i.e., why does search not find
: recently added documents unless I close and re-open it?
this is by design .. an IndexReader (and hence an IndexSearcher) maintain
consistent views of the index at the moment they were open by hanging on
to the open filehandles a
On May 21, 2006, at 10:56 PM, Zhenjian YU wrote:
I didn't dig the source code of lucence deep enough, but I noticed
that the
IndexSearcher uses an IndexReader, while the cost of initializing
IndexReader is a bit high.
The key is the IndexReader.
My application is a webapp, so I think it ma
I am using 1.9.1(java).
I am trying to add documents to an existing index that may or may not
exist. I use a RAMDirectory to build a temp index that is later merged.
Before adding a new document, I search the existing index (using unique
key) to see if it is there. If not, I add it.
In reading t
At 10:15 AM +0100 5/22/06, Irving, Dave wrote:
>- Is there maybe some room for more utility classes in Lucene which make
>this easier? E.g: When building up a document, we don't have to worry
>about running content through an analyser - but unless we use
>QueryParser, there doesn't seem to be corre
On May 22, 2006, at 8:44 AM, Irving, Dave wrote:
So, right now, if Im being lazy, the easiest thing to do is
construct a
query string based on the meta data, and then run that through the
query
parser. This just doesn't -- feel right -- from a design perspective
though :o)
How about build
> You need to parse a query string without using query parser and
> construct the query and still want an analyzer applied on the outcome
search
Not quite. The user is presented with a list of (UI) fields, and each
field already knows whether its an "OR" "AND" etc.
So, there is no query String as
On 5/22/06, Dragon Fly <[EMAIL PROTECTED]> wrote:
The search results of my Lucene application are always sorted
alphabetically.
Therefore, score and relevance are not needed. With that said, is there
anything that I can "disable" to:
(a) Improve the search performance
(b) Reduce the size of the
If i understand correctly, is it that you dont want to make use of query
parse?
You need to parse a query string without using query parser and construct
the query and still want an analyzer applied on the outcome search.
On 5/22/0 p6, Irving, Dave <[EMAIL PROTECTED]> wrote:
Hi Otis,
Thanks
Hi Otis,
Thanks for your reply.
Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in
the solution - but not on its own.
Here's what I mean:
When we build a document Field, we suppy either a String or a Reader.
The framework takes care of running the contents through an Analyse
Dave,
You said you are new to Lucene and you didn't mention this class explicitly, so
you may not be aware of it yet: PerFieldAnalyzerWrapper.
It sounds like this may be what you are after.
Otis
- Original Message
From: "Irving, Dave" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Hi,
The search results of my Lucene application are always sorted
alphabetically.
Therefore, score and relevance are not needed. With that said, is there
anything that I can "disable" to:
(a) Improve the search performance
(b) Reduce the size of the index
(c) Shorten the indexing time
Thank
Hi Michael,
I don't see any responses to your problem. It's early, so you may get some,
but this sounds like a case for JIRA.
Also, please try to write and attach (to your JIRA case) a unit test that
demonstrates a problem, something we can run and debug this. Without that we
may not be able t
Uh, another it depends answer.
Some people prefer one aggregate field, others do not.
If you care about field normalization (shorter fields with matches in them
shoring higher than longer fields with equal number of matches in them), I'd
say keep them separate.
If you want to boost individual fie
The usual answer: it depends :)
Over on http://www.simpy.com I have similar functionality (groups), and I have
them as separate indices.
If you want to be able to reindex individual groups separately, you;ll want
them in separate groups.
If groups in aggregate will get very large, perhaps keepin
Your out of memory error is likely due to a mysql bug outlined here:
http://bugs.mysql.com/bug.php?id=7698
Thanks for the article. My query executed in no time without any errors !!!
The MySQL drivers are horrible at dealing with large result sets - that
article gives you the workaround to
I think, if you dig a little bit what lucene is when asked to do Sort then
you will get the information what you are looking for.
Here is some help.
Lucene uses TopFieldDocCollector for sorting purpose(lookat implementation
of IndexSearcher).
So your HitCollector will extend this TopFieldDocColle
Hi Jelda,
Is there any way by which I can achieve sorting of search results along
with overriding the collect method of the HitCollector in this case?
I have been using
srch.search(query,sort);
If I replace it with srch.search(query, new HitCollector(){ impl of the
collect method to collect c
Hi Mike,
Yes you are right, when we run the optimize(), it creates one large
segment file and makes the searching faster. But the issue is our index
keeps growing every minute as we download documents add to the index, so
we cannot call optimize so often. The indexing seemed to be fine till w
Hello Harini,
When you are finished indexing the documents are you running the
optimize() method on the IndexWriter before closing it? This should
reduce the number of segments and make searching faster. Just a
thought.
--Mike
On 5/22/06, Harini Raghavan <[EMAIL PROTECTED]> wrote:
Hi All,
If I work with groups, whats the best option do do? Use a multiple lucene
index for every group or is bettter an unique index.
For example:
I'm working with groups of people, and the action to add or delete is in
group level but the search is on all groups.
What do you think is the best implementa
Hi,
Im very new to Lucene - so sorry if my question seems pretty dumb.
In the application Im writing, I've been "struggling with myself" over
whether I should be building up queries programatically, or using the
Query Parser.
My searchable fields are driven by meta-data, and I only want to suppo
: Score/Relavence is not Important. I need the Yes/No logic with the what
: caused the Match Info. Could you mayby explain the intersect/union the
: bitsets and the interogating to know
: what matched?
let's say hypothetically the logical "query" you want is "(A OR B) AND (C
OR D)" where A, B, C
38 matches
Mail list logo