RE: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Irving, Dave
J.J. Larrea wrote: > I concur with your thoughts that there is room for such > utility classes, and that those would increase the use of > programmatic queries. I say this as a developer who also > "lazed out" and opted to simply construct a string and let > the QP do all the work (but who t

FW: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Irving, Dave
Erick Erickson wrote: ... > It seems to me that you can always do something like: > BooleanQuery bq; > QueryParser qp1 = new QueryParser("field1", "", analyzer); > Query q1 = qp1.parse("search term or clause); bq.add(q1,,,); > QueryParser qp2 = new QueryParser("field2", "", analyzer); > Query q

Running 20mil queries against an index

2006-05-22 Thread Michael Chan
Hi, I'm trying to run 20mil+ queries against an index containing 2mil documents, and it has been quite slow. I've been reading about MemoryIndex, but it is only a single-document index. As I have quite a bit of RAM (~20gb), is there a way I could store the index in RAM or any other way that makes

Making SpanQuery more effiicent

2006-05-22 Thread Michael Chan
Hi, As I use SpanQuery purely for the use of slop, I was wondering how to make SpanQuery more efficient,. Since I don't need any span information, is there a way to disable the computation for span and other unneeded overhead? Thanks. Michael ---

Re: SpanScorer Out Of Bounds

2006-05-22 Thread Michael Chan
Hi Otis, Thanks for that. I found out that it's a memory usage problem rather than one on Lucene's part. Thanks. Michael On 5/22/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi Michael, I don't see any responses to your problem. It's early, so you may get some, but this sounds like a cas

Re: Checking for duplicates inside index

2006-05-22 Thread Ken Krugler
On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote: > I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... > My idea is to create a checksum of the documents content an

Re: does anybody have the experience to do some pooling upon lucene?

2006-05-22 Thread Zhenjian YU
OK, got it. Thanks. On 5/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On May 21, 2006, at 10:56 PM, Zhenjian YU wrote: > I didn't dig the source code of lucence deep enough, but I noticed > that the > IndexSearcher uses an IndexReader, while the cost of initializing > IndexReader is a bit hi

RE: Checking for duplicates inside index

2006-05-22 Thread Eugene Tuan
I have created a method that can delete duplicate docs. Basically, during indexing, a doc is associated with an id (a term field defined by you.) that is indexed. Then, call the method to delete duplicates whenever you update index. I haven't contributed back to Lucene community yet because our

RE: Checking for duplicates inside index

2006-05-22 Thread Omar Didi
you have two choices that I can think of: 1- before adding a document, check if it does't exist in the index. you can do this by querying on a unique field if you have it . 2- you can index all your documents, and once the indexing is done you can dedupe. (Lucene has built in methods that can hel

Re: Changing the scoring (newest doc date first)

2006-05-22 Thread Doug Cutting
Marcus Falck wrote: There is however one LARGE problem that we have run into. All search result should be displayed sorted with the newest document at top. We tried to accomplish this using Lucene's sort capabilites but quickly ran into large performance bottlenecks. So i figured since the default

Re: How are results merged from a multisearcher?

2006-05-22 Thread Doug Cutting
Tom Emerson wrote: Thanks for the clarification. What then is the difference between a MultiSearcher and using an IndexSearcher on a MultiReader? The results should be identical. A MultiSearcher permits use of ParallelMultiSearcher and RemoteSearchable, for parallel and/or distributed operat

Re: Checking for duplicates inside index

2006-05-22 Thread karl wettin
On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote: > > I'm indexing ~1 documents per day but since I'm getting a lot of > real duplicates (100% the same document content) I want to check the > content before indexing... > > My idea is to create a checksum of the documents content a

RE: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Chris Hostetter
: Not quite. The user is presented with a list of (UI) fields, and each : field already knows whether its an "OR" "AND" etc. : So, there is no query String as such. : For this reason, it seems to make more sense to build the query up : programmatically - as my field meta data can drive this. : How

Checking for duplicates inside index

2006-05-22 Thread Hannes Carl Meyer
Hi All, I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... My idea is to create a checksum of the documents content and store it within document inside the index, before indexing

Re: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Erick Erickson
There's a long scree that I'm leaving at the bottom because I put effort into it and I like to rant. But here's, perhaps, an approach. Maybe I'm mis-interpreting what you're trying to do. I'm assuming that you have several search fields (I'm not exactly sure what "driven by meta-data" means in th

incremental updates

2006-05-22 Thread Van Nguyen
I'm pretty new to lucene and was wondering if there are any resources on how to do incremental updates in lucene. Thanks! Van Nguyen Wynne Systems, Inc. 19800 MacArthur Blvd., Suite 900 Irvine, CA 92612-2421 949.224.6300 ext 223 949.225.6540 (fax) 866.901.9284 (toll-free) www.wynnesystems.com

Re: Searching missing documents after doing an addIndexes

2006-05-22 Thread Chris Hostetter
: Can anyone clarify this behavior, i.e., why does search not find : recently added documents unless I close and re-open it? this is by design .. an IndexReader (and hence an IndexSearcher) maintain consistent views of the index at the moment they were open by hanging on to the open filehandles a

Re: does anybody have the experience to do some pooling upon lucene?

2006-05-22 Thread Erik Hatcher
On May 21, 2006, at 10:56 PM, Zhenjian YU wrote: I didn't dig the source code of lucence deep enough, but I noticed that the IndexSearcher uses an IndexReader, while the cost of initializing IndexReader is a bit high. The key is the IndexReader. My application is a webapp, so I think it ma

Searching missing documents after doing an addIndexes

2006-05-22 Thread Jim Wilson
I am using 1.9.1(java). I am trying to add documents to an existing index that may or may not exist. I use a RAMDirectory to build a temp index that is later merged. Before adding a new document, I search the existing index (using unique key) to see if it is there. If not, I add it. In reading t

Re: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread J.J. Larrea
At 10:15 AM +0100 5/22/06, Irving, Dave wrote: >- Is there maybe some room for more utility classes in Lucene which make >this easier? E.g: When building up a document, we don't have to worry >about running content through an analyser - but unless we use >QueryParser, there doesn't seem to be corre

Re: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Marvin Humphrey
On May 22, 2006, at 8:44 AM, Irving, Dave wrote: So, right now, if Im being lazy, the easiest thing to do is construct a query string based on the meta data, and then run that through the query parser. This just doesn't -- feel right -- from a design perspective though :o) How about build

RE: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Irving, Dave
> You need to parse a query string without using query parser and > construct the query and still want an analyzer applied on the outcome search Not quite. The user is presented with a list of (UI) fields, and each field already knows whether its an "OR" "AND" etc. So, there is no query String as

Re: Performance ...

2006-05-22 Thread Yonik Seeley
On 5/22/06, Dragon Fly <[EMAIL PROTECTED]> wrote: The search results of my Lucene application are always sorted alphabetically. Therefore, score and relevance are not needed. With that said, is there anything that I can "disable" to: (a) Improve the search performance (b) Reduce the size of the

Re: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Raghavendra Prabhu
If i understand correctly, is it that you dont want to make use of query parse? You need to parse a query string without using query parser and construct the query and still want an analyzer applied on the outcome search. On 5/22/0 p6, Irving, Dave <[EMAIL PROTECTED]> wrote: Hi Otis, Thanks

RE: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Irving, Dave
Hi Otis, Thanks for your reply. Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in the solution - but not on its own. Here's what I mean: When we build a document Field, we suppy either a String or a Reader. The framework takes care of running the contents through an Analyse

Re: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Otis Gospodnetic
Dave, You said you are new to Lucene and you didn't mention this class explicitly, so you may not be aware of it yet: PerFieldAnalyzerWrapper. It sounds like this may be what you are after. Otis - Original Message From: "Irving, Dave" <[EMAIL PROTECTED]> To: java-user@lucene.apache.org

Performance ...

2006-05-22 Thread Dragon Fly
Hi, The search results of my Lucene application are always sorted alphabetically. Therefore, score and relevance are not needed. With that said, is there anything that I can "disable" to: (a) Improve the search performance (b) Reduce the size of the index (c) Shorten the indexing time Thank

Re: SpanScorer Out Of Bounds

2006-05-22 Thread Otis Gospodnetic
Hi Michael, I don't see any responses to your problem. It's early, so you may get some, but this sounds like a case for JIRA. Also, please try to write and attach (to your JIRA case) a unit test that demonstrates a problem, something we can run and debug this. Without that we may not be able t

Re: should I avoid create many Fields for a Document?

2006-05-22 Thread Otis Gospodnetic
Uh, another it depends answer. Some people prefer one aggregate field, others do not. If you care about field normalization (shorter fields with matches in them shoring higher than longer fields with equal number of matches in them), I'd say keep them separate. If you want to boost individual fie

Re: What is more efficient?

2006-05-22 Thread Otis Gospodnetic
The usual answer: it depends :) Over on http://www.simpy.com I have similar functionality (groups), and I have them as separate indices. If you want to be able to reindex individual groups separately, you;ll want them in separate groups. If groups in aggregate will get very large, perhaps keepin

Re: OutOfMemory and IOException Access Denied errors

2006-05-22 Thread Dan Armbrust
Your out of memory error is likely due to a mysql bug outlined here: http://bugs.mysql.com/bug.php?id=7698 Thanks for the article. My query executed in no time without any errors !!! The MySQL drivers are horrible at dealing with large result sets - that article gives you the workaround to

RE: Aggregating category hits

2006-05-22 Thread Ramana Jelda
I think, if you dig a little bit what lucene is when asked to do Sort then you will get the information what you are looking for. Here is some help. Lucene uses TopFieldDocCollector for sorting purpose(lookat implementation of IndexSearcher). So your HitCollector will extend this TopFieldDocColle

Re: Aggregating category hits

2006-05-22 Thread Kapil Chhabra
Hi Jelda, Is there any way by which I can achieve sorting of search results along with overriding the collect method of the HitCollector in this case? I have been using srch.search(query,sort); If I replace it with srch.search(query, new HitCollector(){ impl of the collect method to collect c

Re: indexing in lucene 1.9.1

2006-05-22 Thread Harini Raghavan
Hi Mike, Yes you are right, when we run the optimize(), it creates one large segment file and makes the searching faster. But the issue is our index keeps growing every minute as we download documents add to the index, so we cannot call optimize so often. The indexing seemed to be fine till w

Re: indexing in lucene 1.9.1

2006-05-22 Thread Mike Richmond
Hello Harini, When you are finished indexing the documents are you running the optimize() method on the IndexWriter before closing it? This should reduce the number of segments and make searching faster. Just a thought. --Mike On 5/22/06, Harini Raghavan <[EMAIL PROTECTED]> wrote: Hi All,

What is more efficient?

2006-05-22 Thread Dan Wiggin
If I work with groups, whats the best option do do? Use a multiple lucene index for every group or is bettter an unique index. For example: I'm working with groups of people, and the action to add or delete is in group level but the search is on all groups. What do you think is the best implementa

Searching API: QueryParser vs Programatic queries

2006-05-22 Thread Irving, Dave
Hi, Im very new to Lucene - so sorry if my question seems pretty dumb. In the application Im writing, I've been "struggling with myself" over whether I should be building up queries programatically, or using the Query Parser. My searchable fields are driven by meta-data, and I only want to suppo

Re: Need some Advice on Searching

2006-05-22 Thread Chris Hostetter
: Score/Relavence is not Important. I need the Yes/No logic with the what : caused the Match Info. Could you mayby explain the intersect/union the : bitsets and the interogating to know : what matched? let's say hypothetically the logical "query" you want is "(A OR B) AND (C OR D)" where A, B, C