Re: Removing search results that fall within a time range

2006-05-23 Thread Chris Hostetter
A pretty big variable here in trying to find a "clever" solution to your problem is: how many results do you want? Do you need all of them for some sort of downstream processing, or are you only interested in the first M? ... how big is M? Assuming M is something managable, i would try writing a

RE: Removing search results that fall within a time range

2006-05-23 Thread Benjamin Stein
> -Original Message- > From: karl wettin [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 23, 2006 6:44 PM > To: java-user@lucene.apache.org > Subject: Re: Removing search results that fall within a time range > > On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stein wrote: > > I have a requ

Re: updating the data in Lucene

2006-05-23 Thread Daniel Noll
Erik Hatcher wrote: On May 23, 2006, at 8:15 AM, Alberto Marquÿe9s wrote: I have a question about like updating the data in Lucene. Supposing that I have indexed a directory if I want to refresh index (to return to index single files that has been modified). In order to maintain the data

Re: Per-Field Similarity

2006-05-23 Thread Marvin Humphrey
On Tue, May 23, 2006 at 02:49:40PM -0700, Chris Hostetter wrote: > > : Is it possible to have an IndexWriter apply different Similarity > : models to different Fields? > > As far as i know, the only way Similarity comes into play when using an > IndexWriter is lengthNorm, and it is passed the fie

RE: sorting issues

2006-05-23 Thread Van Nguyen
I was expecting it to be sorted alphabetically by a field I think I may have figured out my own question. I was tokenizing the field I wanted to sort. Changed it so that it's not tokenizing that field and I'm getting the results that I was expecting. Thanks, Van Nguyen Wynne Systems, Inc.

Re: Removing search results that fall within a time range

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stein wrote: > I have a requirement to only return one result for all documents whose > timestamps fall within N seconds of one another. (where timestamp is a > field and N is an integer). > > For example, Document A is timestamped "12:00:00" and Documen

Re: sorting issues

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 15:42 -0700, Van Nguyen wrote: > > Does anyone have any sorting issues in lucene? When lucene is > returning results from my query, I get results similar to this: > > E.D. BULLARD > E.D. BULLARD > MINE SAFETY APPL MSA > NORTH SAFETY PRODUCT > NORTH SAFETY PRODUCT > MINE SA

sorting issues

2006-05-23 Thread Van Nguyen
Does anyone have any sorting issues in lucene? When lucene is returning results from my query, I get results similar to this: E.D. BULLARD E.D. BULLARD MINE SAFETY APPL MSA NORTH SAFETY PRODUCT NORTH SAFETY PRODUCT MINE SAFETY APPL MSA MINE SAFETY APPL MSA NORTH SAFETY PRODUCT ... Van This co

Re: Per-Field Similarity

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 15:03 -0700, Chris Hostetter wrote: > > Why wouldn't you just provide your own Similarity instance that looked > at the fieldName passed to the lengthNorm method? Perhaps one reason could be.. hmm. that it would make it one similarity per field and IndexWriter and he really

Re: Per-Field Similarity

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 14:49 -0700, Chris Hostetter wrote: > i've definitely wished more then once that they took in a field name > as a parameter. +1 for starting a branch with non-depricated radically reconstructed fields after release of 2.0. I'd be happy to document all design discussions wit

Re: Per-Field Similarity

2006-05-23 Thread Chris Hostetter
: Refactor : : : class DocumentWriter { : private final void writeNorms(String segment) throws IOException { : for(int n = 0; n < fieldInfos.size(); n++){ : FieldInfo fi = fieldInfos.fieldInfo(n); : if(fi.isIndexed && !fi.omitNorms){ : float norm = fieldBoosts[n] * simila

Re: Per-Field Similarity

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 14:29 -0700, Marvin Humphrey wrote: > Greets, > > Is it possible to have an IndexWriter apply different Similarity > models to different Fields? You only want to apply this to the norms? Are up for an ad-hoc solution? Refactor : class DocumentWriter { private final voi

Re: Per-Field Similarity

2006-05-23 Thread Chris Hostetter
: Is it possible to have an IndexWriter apply different Similarity : models to different Fields? As far as i know, the only way Similarity comes into play when using an IndexWriter is lengthNorm, and it is passed the fieldName so it's easy to make it's behavior field specific (SimilarityDelegator

Removing search results that fall within a time range

2006-05-23 Thread Benjamin Stein
I have a requirement to only return one result for all documents whose timestamps fall within N seconds of one another. (where timestamp is a field and N is an integer). For example, Document A is timestamped "12:00:00" and Document B has timestamp "12:00:30", Document B should be discarded. On t

Re: Problem using SpellChecker with run time strings

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 18:45 +0200, karl wettin wrote: > On Tue, 2006-05-23 at 10:50 -0500, James Maes wrote: > > > It seems to be related to Strings and when they are created. > > the bug where the per instance float for maximum score (accuracy) is > modified instead of using it local in the met

Per-Field Similarity

2006-05-23 Thread Marvin Humphrey
Greets, Is it possible to have an IndexWriter apply different Similarity models to different Fields? Marvin Humphrey Rectangular Research http://www.rectangular.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additiona

Re: How are results merged from a multisearcher?

2006-05-23 Thread Tom Emerson
Doug, Thanks much for the clarification. That helps put everything in the right frame for me! -tree On 5/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote: Tom Emerson wrote: > Thanks for the clarification. What then is the difference between a > MultiSearcher and using an IndexSearcher on a M

Re: Web services for querying and return of results

2006-05-23 Thread Peter A. Daly
Marc, To me it sounds like Lucene may certainly be a good tool for what you describe. In terms of Hit caching. Depending on how exactly you are using it, lucene can be insanely fast, making hit caching not necessary. I believe it also does internal caching to some degree as long as you are usin

Re: Web services for querying and return of results

2006-05-23 Thread Chris Hostetter
The usage you describe sounds perfectly suited for Solr ... without even needing heavy customizations or custom plugins... : Hi Erik, many thanks for your response - a typical search application : that will consume the web service will typically want to display 25 : results per page. Most users

Re: Web services for querying and return of results

2006-05-23 Thread Marc Dauncey
Hi Erik, many thanks for your response - a typical search application that will consume the web service will typically want to display 25 results per page. Most users will only be interested in the first few pages, but there are certain searches with users that will want to examine many pages o

Re: Making SpanQuery more effiicent

2006-05-23 Thread Chris Hostetter
: Unfortunately, I want to have subqueries inside my query (e.g. (t1 AND : t2) NEAR (t3 OR t4)), and PhraseQuery seems to allow only Terms inside : it. In that case, you aren't just using SpanQuery for the use of slop -- you are using the Span information, you just don't realize it (that's how al

Re: ORing complementary queries gives no results

2006-05-23 Thread Chris Hostetter
I suspect the final query structure isn't what you think it is ... take a look at the toString on your query. in general, there is no way to just do a "NOT foo" type query ... prohibiting things only makes sense in the context of selecting something else ... i'm guessing the query structure you a

Re: Web services for querying and return of results

2006-05-23 Thread Erik Hatcher
On May 23, 2006, at 1:41 PM, Marc Dauncey wrote: Has anyone used this as a delivery mechanism for Lucene query results? A quick search on Google reveals a Lucene Web Service project on SourceForge, but what i want to know is whether people on the list know of any big drawbacks, specificall

Web services for querying and return of results

2006-05-23 Thread Marc Dauncey
Has anyone used this as a delivery mechanism for Lucene query results? A quick search on Google reveals a Lucene Web Service project on SourceForge, but what i want to know is whether people on the list know of any big drawbacks, specifically, how well could I expect this to perform, as compared

Re: Performance ...

2006-05-23 Thread Dragon Fly
I'll give it a try, thanks. From: "Yonik Seeley" <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: Performance ... Date: Mon, 22 May 2006 11:40:46 -0400 On 5/22/06, Dragon Fly <[EMAIL PROTECTED]> wrote: The search results of my Lucene applic

Re: incremental updates

2006-05-23 Thread karl wettin
On Mon, 2006-05-22 at 13:07 -0700, Van Nguyen wrote: > I'm pretty new to lucene and was wondering if there are any resources on > how to do incremental updates in lucene. What do you mean by incremental updates? You add data to your corpus by using the IndexWriter. --

Re: Problem using SpellChecker with run time strings

2006-05-23 Thread karl wettin
On Tue, 2006-05-23 at 10:50 -0500, James Maes wrote: > It seems to be related to Strings and when they are created. Try to create a new instance of SpellChecker for each suggestion. Will it work? Then you have hit the bug where the per instance float for maximum score (accuracy) is modified ins

Problem using SpellChecker with run time strings

2006-05-23 Thread James Maes
Here is the problem: We have implemented the lucene eninge within our application server which is built ontop of Tomcat. We've had no problems creating the indexes or searching them. The problems we are having are all related to the SpellChecker part of the system. It seems to be related to St

Re: Checking for duplicates inside index

2006-05-23 Thread Yonik Seeley
On 5/23/06, Jimmy the Geek <[EMAIL PROTECTED]> wrote: Or any other suggestions on good ways to prevent duplicates? I am indexing with a field that has a unique ID, so it should be fairly straightforward... Solr does this efficiently: http://www.mail-archive.com/java-user@lucene.apache.org/msg05

Re: [Spam:5.0] Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

2006-05-23 Thread Patrick Kimber
Hi Adam Thanks for your help. Patrick On 23/05/06, Adam Constabaris <[EMAIL PROTECTED]> wrote: Patrick Kimber wrote: > Hi Adam > > We are getting the same error. Did you manage to work out what was > causing the problem? > > Thanks > Patrick I can't say anything definitive about this, but I

Scoring

2006-05-23 Thread Malcolm Clark
Hi experts, I'm currently indexing the New INEX collection using lucene and pondering this question. When searching how do I retrieve the score based on a section or paragraph etc, and not the document score, when the documents are indexed in multi-fields (XML). Can anyone point me in the correc

Re: An arguable bug in Lucene 1.9.1

2006-05-23 Thread Erick Erickson
Hold the presses. I can't get my junit test to show this as a problem. So I'm exploring further. It may be some weird interaction with my index. I'll post more later. Sorry for the spam. Erick

Re: Analyzer question

2006-05-23 Thread AsifTheManRahman
Thanks Jeff. :) -- View this message in context: http://www.nabble.com/Analyzer+question-t1650271.html#a4524125 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional c

Re: [Spam:5.0] Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

2006-05-23 Thread Adam Constabaris
Patrick Kimber wrote: Hi Adam We are getting the same error. Did you manage to work out what was causing the problem? Thanks Patrick I can't say anything definitive about this, but I think it was due to a corrupted index; on the hunch that the index creation/update threads were reliably pu

RE: Checking for duplicates inside index

2006-05-23 Thread Jimmy the Geek
Any chance I could get my hands on code to "de-dup". I have a current method I think is quite sub-optimal, as I am searching the index for a dup on every insert Not a very good method I think... Or any other suggestions on good ways to prevent duplicates? I am indexing with a field that has a

An arguable bug in Lucene 1.9.1

2006-05-23 Thread Erick Erickson
I'm constructing a BooleanQuery across several fields with SpanNearQuerys. In the degenerate case of spanning *one* term, AND adding a non-span clause, I get an exception thrown. Of course you can argue that a span query over one term is silly and shouldn't be done, but I thought I'd mention this.

Re: updating the data in Lucene

2006-05-23 Thread Erik Hatcher
On May 23, 2006, at 8:15 AM, Alberto Marquÿe9s wrote: I have a question about like updating the data in Lucene. Supposing that I have indexed a directory if I want to refresh index (to return to index single files that has been modified). In order to maintain the data updated. There is

Re: similar ArrayIndexOutOfBoundsException on searching and optimizing

2006-05-23 Thread Patrick Kimber
Hi Adam We are getting the same error. Did you manage to work out what was causing the problem? Thanks Patrick On 21/04/06, Adam Constabaris <[EMAIL PROTECTED]> wrote: This is a puzzler, I'm not sure if I'm doing something wrong or whether I have a poisoned document, a corrupted index (failin

Re: Running 20mil queries against an index

2006-05-23 Thread Michael Chan
I think I've fixed the problem by changing/fixing RAMOutputStream.java. On 5/23/06, Muralidharan V <[EMAIL PROTECTED]> wrote: On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote: > > As I have quite a bit of RAM (~20gb) And I once had a 486 with 2MB RAM, which was later 'upgraded' to 4MB :-)

updating the data in Lucene

2006-05-23 Thread Alberto Marquÿffffe9s
I have a question about like updating the data in Lucene. Supposing that I have indexed a directory if I want to refresh index (to return to index single files that has been modified). In order to maintain the data updated. There is faster form to do the one that using: IndexWriter(indexDir, new

Re: Making SpanQuery more effiicent

2006-05-23 Thread Michael Chan
Hi Erik, Unfortunately, I want to have subqueries inside my query (e.g. (t1 AND t2) NEAR (t3 OR t4)), and PhraseQuery seems to allow only Terms inside it. Michael On 5/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: PhraseQuery has a slop factor also - would it work for you instead of SpanNearQ

Re: Making SpanQuery more effiicent

2006-05-23 Thread Erik Hatcher
PhraseQuery has a slop factor also - would it work for you instead of SpanNearQuery? Erik On May 23, 2006, at 1:36 AM, Michael Chan wrote: Hi, As I use SpanQuery purely for the use of slop, I was wondering how to make SpanQuery more efficient,. Since I don't need any span informatio

Re: Running 20mil queries against an index

2006-05-23 Thread Muralidharan V
On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote: As I have quite a bit of RAM (~20gb) And I once had a 486 with 2MB RAM, which was later 'upgraded' to 4MB :-)

Re: Running 20mil queries against an index

2006-05-23 Thread Michael Chan
Thanks for that. Does anyone know how much RAM a 5gb index will need? With mx set to 27gb, it crashes when it flushes buffer at one point. "bash-2.03$ Exception in thread "main" java.lang.ExceptionInInitializerError at TaxonomyFinder.RelatedCatsFinder.(RelatedCatsFinder.java:46) at

ORing complementary queries gives no results

2006-05-23 Thread Satuluri, Venu_Madhav
Hi all, I build Query objects programmatically. I do this by getting a TermQuery/PhraseQuery/whatever for each term in the user query, make a BooleanClause by specifying isRequired and isProhibited depending on whether the term has an "and" or an "or" or an "or not" etc prefixed before it (I use 1

RE: Changing the scoring (newest doc date first)

2006-05-23 Thread Marcus Falck
Hmm. Not sure that I understand exactly what you mean. Doesn't your solution require me to add all documents in correct date range? Since I will index articles from different systems I can't guarantee that all articles will be added to the index in correct date order. / Marcus _

RE: Searching API: QueryParser vs Programatic queries

2006-05-23 Thread Irving, Dave
> The QueryParser then adds the -- parsing -- on top of this, but can delegate for query delegation. That sould be "query creation", of course. > -Original Message- > From: Irving, Dave [mailto:[EMAIL PROTECTED] > Sent: 23 May 2006 08:30 > To: java-user@lucene.apache.org > Subject: RE:

RE: Searching API: QueryParser vs Programatic queries

2006-05-23 Thread Irving, Dave
Chris Hostetter wrote: > typically, when build queries up from form data, each piece > of data falls into one of 2 categories: > > 1) data which doesn't need analyzed because the field it's going to > query on wasn't tokenized (ie: a date field, or a > numeric field, or a > boolean

Re: Running 20mil queries against an index

2006-05-23 Thread Daniel Naber
On Dienstag 23 Mai 2006 08:26, Michael Chan wrote: > As I have quite a > bit of RAM (~20gb), is there a way I could store the index in RAM or > any other way that makes use of it to improve performance? RAMDirectory has just been fixed (in SVN) to work with indexes > 2 GB. Regards Daniel -- h