Thanks Yonik & Ken for both answers; I think the explanations went a
little over my head, but I think you understood what I was talking
about! Basically, a better filter to remove all possible accents (&
umlauts as a bonus, for completeness sake; I personally would have no
use for it).
I thin
Chris Hostetter wrote:
: > undesired words as a sort of stoplist. But surely there's a better way
: > to do it (the inverted index structure seems like this should be
: > natural). Any pointers would be most helpful.
I've never given this much thought, but i know that merging indexes can be
do
> I found out how to determine the number of documents in which a term
> appeared by looking at the Luke code, but how does one determine the
> number of times it occurs in each document?
Use TermDocs -
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html
Something like -
: > undesired words as a sort of stoplist. But surely there's a better way
: > to do it (the inverted index structure seems like this should be
: > natural). Any pointers would be most helpful.
I've never given this much thought, but i know that merging indexes can be
done with IndexReaders, an
On Wednesday 13 September 2006 15:41, Miles Efron wrote:
> This question surely shows how new I am to Lucene... but I'm interested
> in removing terms from a lucene index. In particular, I'd like to be
> able to delete all terms that appear in fewer than x documents (say
> x=3). This is in eff
I think it is not possible, by only modifying Similarity, to make the total
score only count for documents boosts (which is the original request in
this discussion).
This is because a higher level scorer always sums the scores of "its"
sub-scorers - is this right...? if so there are probably two
Hi all,
There is an issue opened on Lucene:
http://issues.apache.org/jira/browse/LUCENE-665
that I'd like to draw your attention to and summarize here because
recently users have hit it.
The gist of the issue is: on Windows, you sometimes see intermittant
"Access Denied" errors in renaming
1) This is not java. Since it's not java, i can't even begin to guess
what odd excentricities might exist in whatever lucene port you are
using.
2) If this *were* java then it wouldn't work th way you want it to, since
you have the tf function returning "1" regardless of the frequency ..
I see. This is what I was curious about. Thanks!
On 9/14/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Huinan wrote:
> Thanks, Ronnie. But why it works in some cases (when there is a small
> number
> of documents inside the index) ?
The Hits class retrieves the first 50 results, and caches t
Miles,
I understand you are trying to solve your problem by changing the index
contents (removing documents). Would it be possible to workaround it and
to achieve this during search, by only returning the relevant documents
and ignore the rests?
Just my 2 cents...
Tom
Miles Efron wrote:
RangeQueries are evil.
http://wiki.apache.org/jakarta-lucene/FilteringOptions
- Original Message
From: Bhavin Pandya <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 13 September, 2006 3:22:38 PM
Subject: range query
Hi,
I am using lucene from last few months...
: I found out how to determine the number of documents in which a term
: appeared by looking at the Luke code, but how does one determine the
: number of times it occurs in each document?
take a look at the TermDocs class.
-Hoss
Hi,
I am using lucene from last few months...I have question about the range query
performance...
Is there any alternative of range query or can i fire a range query on a small
set of documents so that it can be less expensive...
- Bhavin pandya
http://www.googlified.com/55fun.php
Es muy entretenido, uno empieza a leer y no puede parar...
la versión en pdf del libro la encuentran aquí:
http://www.55fun.com/book.pdf
--
Gracias por su atención.
Cordial saludo,
Alex. S.
This question surely shows how new I am to Lucene... but I'm interested
in removing terms from a lucene index. In particular, I'd like to be
able to delete all terms that appear in fewer than x documents (say
x=3). This is in efforts to reduce the feature set for some research
I'm doing.
I
Huinan wrote:
Thanks, Ronnie. But why it works in some cases (when there is a small
number
of documents inside the index) ?
The Hits class retrieves the first 50 results, and caches them.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ |
On Sep 13, 2006, at 3:39 AM, Paul Elschot wrote:
On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote:
Is it possible for me to store the number of occurances of a token in
a
particular document or a collection of documents?
When the token is indexed as a term, an IndexReader pro
Thank you very much.
Yes, I´m very new to Lucene. I´m sorry
With the help of Lucene we want to classify 724.827 legal files that in the
first line contained the word "Auto" or "Providencia". We can to separate in
two groups. That´s why I´ve indexed these files with Lucene before, and we
thought t
Example:
Enter query:
AllText:Microsoft
score: 0,01476238 2002-02-19 05:09:00(122578) Qwest pins recovery hopes on
long-distance
score: 0,01476227 2002-02-19 05:07:00(122547) Microsoft ordered to
let states see Windows code
Enter query:
AllText:Microsoft OR AllText:IBM
score: 0,02949772
It didn't really work for booleanqueries either. I thought it was working for
some hours but to my big disappointment I realized that this was not the case.
Im using two IndexReaders ( RAM and FS ) and one multireader. Creating one
indexsearcher by passing the multireader as constructor argument
I'm assuming that you're new to Lucene, so if you're an old pro you probably
already know all this
I think you'll have difficulty here. Lucene has no concept of lines, just
tokens and offsets. So here are a couple of suggestions off the top of my
head...
If the first line is the *only* way
I agree.
Thanks.
On 9/13/06, Ronnie Kolehmainen <[EMAIL PROTECTED]> wrote:
This might be related to filesystem, internal lucene buffering/caching,
or practically anything that an implementor does not need to have
knowledge of.
The only thing that you, the implementor, *do* need to know is th
This might be related to filesystem, internal lucene buffering/caching,
or practically anything that an implementor does not need to have
knowledge of.
The only thing that you, the implementor, *do* need to know is that you
should *not* access a Hits object after the searcher is closed ;)
/R
Thanks, Ronnie. But why it works in some cases (when there is a small number
of documents inside the index) ?
On 9/13/06, Ronnie Kolehmainen <[EMAIL PROTECTED]> wrote:
Do not close the searcher until you are done with the Hits object.
See the javadocs for Searchable.close()
http://lucene.apa
Do not close the searcher until you are done with the Hits object.
See the javadocs for Searchable.close()
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Searchable.html#close()
/Ronnie
Huinan wrote:
Hi,
I'm having a weird problem:
I created an index using IndexWriter. Then
Thanks Doron/Erick,
Option A did work and looks like I wasn't adding the fields to the
Doc object correctly. I still used UN_TOKENIZED as these were plain
strings that I wanted to do a full string comparison against.
So basically the query +booktype:guides +content:management
+(subtype:accoun
Hi,
I'm having a weird problem:
I created an index using IndexWriter. Then I had a piece of code which
searches the index, then print out a particular field of the first document
of the hits.(See the following code) As simple as that.
Hits hits = IndexSearchUtil.getHits(defaultIndexLocat
Hi all,
I´ve got a index and now I´m trying to create a query with lucene-2.0.0,
I´d like to find files that in the first line get the following:
AND Word2
I´m tried with the package org.apache.lucene.demo.SearchFiles
but I get files where the word "Word2" is not in the first line.
I don´t k
On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote:
>
> Is it possible for me to store the number of occurances of a token in a
> particular document or a collection of documents?
When the token is indexed as a term, an IndexReader provides
access to the total number of documents conta
Is it possible for me to store the number of occurances of a token in a
particular document or a collection of documents?
Regards,
Venkateshprasanna
--
View this message in context:
http://www.nabble.com/Storing-no.-of-occurances-of-a-token-tf2263455.html#a6280422
Sent from the Lucene - Java Us
30 matches
Mail list logo