A pretty big variable here in trying to find a "clever" solution to your
problem is: how many results do you want?
Do you need all of them for some sort of downstream processing, or are you
only interested in the first M? ... how big is M?
Assuming M is something managable, i would try writing a
> -Original Message-
> From: karl wettin [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 23, 2006 6:44 PM
> To: java-user@lucene.apache.org
> Subject: Re: Removing search results that fall within a time range
>
> On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stein wrote:
> > I have a requ
Erik Hatcher wrote:
On May 23, 2006, at 8:15 AM, Alberto Marquÿe9s wrote:
I have a question about like updating the data in Lucene. Supposing
that I have indexed a directory if I want to refresh index (to return
to index single files that has been modified). In order to maintain
the data
On Tue, May 23, 2006 at 02:49:40PM -0700, Chris Hostetter wrote:
>
> : Is it possible to have an IndexWriter apply different Similarity
> : models to different Fields?
>
> As far as i know, the only way Similarity comes into play when using an
> IndexWriter is lengthNorm, and it is passed the fie
I was expecting it to be sorted alphabetically by a field
I think I may have figured out my own question. I was tokenizing the
field I wanted to sort. Changed it so that it's not tokenizing that
field and I'm getting the results that I was expecting.
Thanks,
Van Nguyen
Wynne Systems, Inc.
On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stein wrote:
> I have a requirement to only return one result for all documents whose
> timestamps fall within N seconds of one another. (where timestamp is a
> field and N is an integer).
>
> For example, Document A is timestamped "12:00:00" and Documen
On Tue, 2006-05-23 at 15:42 -0700, Van Nguyen wrote:
>
> Does anyone have any sorting issues in lucene? When lucene is
> returning results from my query, I get results similar to this:
>
> E.D. BULLARD
> E.D. BULLARD
> MINE SAFETY APPL MSA
> NORTH SAFETY PRODUCT
> NORTH SAFETY PRODUCT
> MINE SA
Does anyone have any sorting issues in lucene? When lucene is returning
results from my query, I get results similar to this:
E.D. BULLARD
E.D. BULLARD
MINE SAFETY APPL MSA
NORTH SAFETY PRODUCT
NORTH SAFETY PRODUCT
MINE SAFETY APPL MSA
MINE SAFETY APPL MSA
NORTH SAFETY PRODUCT
...
Van
This co
On Tue, 2006-05-23 at 15:03 -0700, Chris Hostetter wrote:
>
> Why wouldn't you just provide your own Similarity instance that looked
> at the fieldName passed to the lengthNorm method?
Perhaps one reason could be.. hmm. that it would make it one similarity
per field and IndexWriter and he really
On Tue, 2006-05-23 at 14:49 -0700, Chris Hostetter wrote:
> i've definitely wished more then once that they took in a field name
> as a parameter.
+1 for starting a branch with non-depricated radically reconstructed
fields after release of 2.0.
I'd be happy to document all design discussions wit
: Refactor :
:
: class DocumentWriter {
: private final void writeNorms(String segment) throws IOException {
: for(int n = 0; n < fieldInfos.size(); n++){
: FieldInfo fi = fieldInfos.fieldInfo(n);
: if(fi.isIndexed && !fi.omitNorms){
: float norm = fieldBoosts[n] * simila
On Tue, 2006-05-23 at 14:29 -0700, Marvin Humphrey wrote:
> Greets,
>
> Is it possible to have an IndexWriter apply different Similarity
> models to different Fields?
You only want to apply this to the norms? Are up for an ad-hoc solution?
Refactor :
class DocumentWriter {
private final voi
: Is it possible to have an IndexWriter apply different Similarity
: models to different Fields?
As far as i know, the only way Similarity comes into play when using an
IndexWriter is lengthNorm, and it is passed the fieldName so it's easy to
make it's behavior field specific (SimilarityDelegator
I have a requirement to only return one result for all documents whose
timestamps fall within N seconds of one another. (where timestamp is a
field and N is an integer).
For example, Document A is timestamped "12:00:00" and Document B has
timestamp "12:00:30", Document B should be discarded. On t
On Tue, 2006-05-23 at 18:45 +0200, karl wettin wrote:
> On Tue, 2006-05-23 at 10:50 -0500, James Maes wrote:
>
> > It seems to be related to Strings and when they are created.
>
> the bug where the per instance float for maximum score (accuracy) is
> modified instead of using it local in the met
Greets,
Is it possible to have an IndexWriter apply different Similarity
models to different Fields?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additiona
Doug,
Thanks much for the clarification. That helps put everything in the right
frame for me!
-tree
On 5/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Tom Emerson wrote:
> Thanks for the clarification. What then is the difference between a
> MultiSearcher and using an IndexSearcher on a M
Marc,
To me it sounds like Lucene may certainly be a good tool for what you describe.
In terms of Hit caching. Depending on how exactly you are using it,
lucene can be insanely fast, making hit caching not necessary. I
believe it also does internal caching to some degree as long as you
are usin
The usage you describe sounds perfectly suited for Solr ... without even
needing heavy customizations or custom plugins...
: Hi Erik, many thanks for your response - a typical search application
: that will consume the web service will typically want to display 25
: results per page. Most users
Hi Erik, many thanks for your response - a typical search application that will
consume the web service will typically want to display 25 results per page.
Most users will only be interested in the first few pages, but there are
certain searches with users that will want to examine many pages o
: Unfortunately, I want to have subqueries inside my query (e.g. (t1 AND
: t2) NEAR (t3 OR t4)), and PhraseQuery seems to allow only Terms inside
: it.
In that case, you aren't just using SpanQuery for the use of slop -- you
are using the Span information, you just don't realize it (that's how al
I suspect the final query structure isn't what you think it is ... take a
look at the toString on your query.
in general, there is no way to just do a "NOT foo" type query ...
prohibiting things only makes sense in the context of selecting something
else ... i'm guessing the query structure you a
On May 23, 2006, at 1:41 PM, Marc Dauncey wrote:
Has anyone used this as a delivery mechanism for Lucene query results?
A quick search on Google reveals a Lucene Web Service project on
SourceForge, but what i want to know is whether people on the list
know of any big drawbacks, specificall
Has anyone used this as a delivery mechanism for Lucene query results?
A quick search on Google reveals a Lucene Web Service project on SourceForge,
but what i want to know is whether people on the list know of any big
drawbacks, specifically, how well could I expect this to perform, as compared
I'll give it a try, thanks.
From: "Yonik Seeley" <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Performance ...
Date: Mon, 22 May 2006 11:40:46 -0400
On 5/22/06, Dragon Fly <[EMAIL PROTECTED]> wrote:
The search results of my Lucene applic
On Mon, 2006-05-22 at 13:07 -0700, Van Nguyen wrote:
> I'm pretty new to lucene and was wondering if there are any resources on
> how to do incremental updates in lucene.
What do you mean by incremental updates? You add data to your corpus by
using the IndexWriter.
--
On Tue, 2006-05-23 at 10:50 -0500, James Maes wrote:
> It seems to be related to Strings and when they are created.
Try to create a new instance of SpellChecker for each suggestion. Will
it work? Then you have hit the bug where the per instance float for
maximum score (accuracy) is modified ins
Here is the problem:
We have implemented the lucene eninge within our application server which is
built ontop of Tomcat. We've had no problems creating the indexes or
searching them. The problems we are having are all related to the
SpellChecker part of the system.
It seems to be related to St
On 5/23/06, Jimmy the Geek <[EMAIL PROTECTED]> wrote:
Or any other suggestions on good ways to prevent duplicates? I am
indexing with a field that has a unique ID, so it should be fairly
straightforward...
Solr does this efficiently:
http://www.mail-archive.com/java-user@lucene.apache.org/msg05
Hi Adam
Thanks for your help.
Patrick
On 23/05/06, Adam Constabaris <[EMAIL PROTECTED]> wrote:
Patrick Kimber wrote:
> Hi Adam
>
> We are getting the same error. Did you manage to work out what was
> causing the problem?
>
> Thanks
> Patrick
I can't say anything definitive about this, but I
Hi experts,
I'm currently indexing the New INEX collection using lucene and pondering this
question.
When searching how do I retrieve the score based on a section or paragraph etc,
and not the document score, when the documents are indexed in multi-fields
(XML).
Can anyone point me in the correc
Hold the presses. I can't get my junit test to show this as a problem. So
I'm exploring further. It may be some weird interaction with my index. I'll
post more later.
Sorry for the spam.
Erick
Thanks Jeff. :)
--
View this message in context:
http://www.nabble.com/Analyzer+question-t1650271.html#a4524125
Sent from the Lucene - Java Users forum at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional c
Patrick Kimber wrote:
Hi Adam
We are getting the same error. Did you manage to work out what was
causing the problem?
Thanks
Patrick
I can't say anything definitive about this, but I think it was due to a
corrupted index; on the hunch that the index creation/update threads
were reliably pu
Any chance I could get my hands on code to "de-dup". I have a current
method I think is quite sub-optimal, as I am searching the index for a
dup on every insert Not a very good method I think...
Or any other suggestions on good ways to prevent duplicates? I am
indexing with a field that has a
I'm constructing a BooleanQuery across several fields with SpanNearQuerys.
In the degenerate case of spanning *one* term, AND adding a non-span clause,
I get an exception thrown. Of course you can argue that a span query over
one term is silly and shouldn't be done, but I thought I'd mention this.
On May 23, 2006, at 8:15 AM, Alberto Marquÿe9s wrote:
I have a question about like updating the data in Lucene. Supposing
that I have indexed a directory if I want to refresh index (to
return to index single files that has been modified). In order to
maintain the data updated. There is
Hi Adam
We are getting the same error. Did you manage to work out what was
causing the problem?
Thanks
Patrick
On 21/04/06, Adam Constabaris <[EMAIL PROTECTED]> wrote:
This is a puzzler, I'm not sure if I'm doing something wrong or whether
I have a poisoned document, a corrupted index (failin
I think I've fixed the problem by changing/fixing RAMOutputStream.java.
On 5/23/06, Muralidharan V <[EMAIL PROTECTED]> wrote:
On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote:
>
> As I have quite a bit of RAM (~20gb)
And I once had a 486 with 2MB RAM, which was later 'upgraded' to 4MB :-)
I have a question about like updating the data in Lucene. Supposing that I have
indexed a directory if I want to refresh index (to return to index single files
that has been modified). In order to maintain the data updated. There is faster
form to do the one that using: IndexWriter(indexDir, new
Hi Erik,
Unfortunately, I want to have subqueries inside my query (e.g. (t1 AND
t2) NEAR (t3 OR t4)), and PhraseQuery seems to allow only Terms inside
it.
Michael
On 5/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
PhraseQuery has a slop factor also - would it work for you instead of
SpanNearQ
PhraseQuery has a slop factor also - would it work for you instead of
SpanNearQuery?
Erik
On May 23, 2006, at 1:36 AM, Michael Chan wrote:
Hi,
As I use SpanQuery purely for the use of slop, I was wondering how to
make SpanQuery more efficient,. Since I don't need any span
informatio
On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote:
As I have quite a bit of RAM (~20gb)
And I once had a 486 with 2MB RAM, which was later 'upgraded' to 4MB :-)
Thanks for that. Does anyone know how much RAM a 5gb index will need?
With mx set to 27gb, it crashes when it flushes buffer at one point.
"bash-2.03$ Exception in thread "main" java.lang.ExceptionInInitializerError
at TaxonomyFinder.RelatedCatsFinder.(RelatedCatsFinder.java:46)
at
Hi all,
I build Query objects programmatically. I do this by getting a
TermQuery/PhraseQuery/whatever for each term in the user query, make a
BooleanClause by specifying isRequired and isProhibited depending on
whether the term has an "and" or an "or" or an "or not" etc prefixed
before it (I use 1
Hmm.
Not sure that I understand exactly what you mean.
Doesn't your solution require me to add all documents in correct date range?
Since I will index articles from different systems I can't guarantee that all
articles will be added to the index in correct date order.
/
Marcus
_
> The QueryParser then adds the -- parsing -- on top of this, but can
delegate for query delegation.
That sould be "query creation", of course.
> -Original Message-
> From: Irving, Dave [mailto:[EMAIL PROTECTED]
> Sent: 23 May 2006 08:30
> To: java-user@lucene.apache.org
> Subject: RE:
Chris Hostetter wrote:
> typically, when build queries up from form data, each piece
> of data falls into one of 2 categories:
>
> 1) data which doesn't need analyzed because the field it's going to
> query on wasn't tokenized (ie: a date field, or a
> numeric field, or a
> boolean
On Dienstag 23 Mai 2006 08:26, Michael Chan wrote:
> As I have quite a
> bit of RAM (~20gb), is there a way I could store the index in RAM or
> any other way that makes use of it to improve performance?
RAMDirectory has just been fixed (in SVN) to work with indexes > 2 GB.
Regards
Daniel
--
h
49 matches
Mail list logo