Re: UTF8 accents & umlauts filter?

2006-09-13 Thread Michael Imbeault
Thanks Yonik & Ken for both answers; I think the explanations went a little over my head, but I think you understood what I was talking about! Basically, a better filter to remove all possible accents (& umlauts as a bonus, for completeness sake; I personally would have no use for it). I thin

Re: removing a term from a lucene index

2006-09-13 Thread Andrzej Bialecki
Chris Hostetter wrote: : > undesired words as a sort of stoplist. But surely there's a better way : > to do it (the inverted index structure seems like this should be : > natural). Any pointers would be most helpful. I've never given this much thought, but i know that merging indexes can be do

Re: Storing no. of occurances of a token

2006-09-13 Thread Doron Cohen
> I found out how to determine the number of documents in which a term > appeared by looking at the Luke code, but how does one determine the > number of times it occurs in each document? Use TermDocs - http://lucene.apache.org/java/docs/api/org/apache/lucene/index/TermDocs.html Something like -

Re: removing a term from a lucene index

2006-09-13 Thread Chris Hostetter
: > undesired words as a sort of stoplist. But surely there's a better way : > to do it (the inverted index structure seems like this should be : > natural). Any pointers would be most helpful. I've never given this much thought, but i know that merging indexes can be done with IndexReaders, an

Re: removing a term from a lucene index

2006-09-13 Thread Paul Elschot
On Wednesday 13 September 2006 15:41, Miles Efron wrote: > This question surely shows how new I am to Lucene... but I'm interested > in removing terms from a lucene index. In particular, I'd like to be > able to delete all terms that appear in fewer than x documents (say > x=3). This is in eff

Re: SV: SV: Changing the Scoring api

2006-09-13 Thread Doron Cohen
I think it is not possible, by only modifying Similarity, to make the total score only count for documents boosts (which is the original request in this discussion). This is because a higher level scorer always sums the scores of "its" sub-scorers - is this right...? if so there are probably two

intermittant "Access Denied" IOExceptions on Windows

2006-09-13 Thread Michael McCandless
Hi all, There is an issue opened on Lucene: http://issues.apache.org/jira/browse/LUCENE-665 that I'd like to draw your attention to and summarize here because recently users have hit it. The gist of the issue is: on Windows, you sometimes see intermittant "Access Denied" errors in renaming

Re: SV: SV: Changing the Scoring api

2006-09-13 Thread Chris Hostetter
1) This is not java. Since it's not java, i can't even begin to guess what odd excentricities might exist in whatever lucene port you are using. 2) If this *were* java then it wouldn't work th way you want it to, since you have the tf function returning "1" regardless of the frequency ..

Re: IOException when calling hits.doc(int)

2006-09-13 Thread Huinan
I see. This is what I was curious about. Thanks! On 9/14/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Huinan wrote: > Thanks, Ronnie. But why it works in some cases (when there is a small > number > of documents inside the index) ? The Hits class retrieves the first 50 results, and caches t

Re: removing a term from a lucene index

2006-09-13 Thread Tom Bouctou
Miles, I understand you are trying to solve your problem by changing the index contents (removing documents). Would it be possible to workaround it and to achieve this during search, by only returning the relevant documents and ignore the rests? Just my 2 cents... Tom Miles Efron wrote:

Re: range query

2006-09-13 Thread mark harwood
RangeQueries are evil. http://wiki.apache.org/jakarta-lucene/FilteringOptions - Original Message From: Bhavin Pandya <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 13 September, 2006 3:22:38 PM Subject: range query Hi, I am using lucene from last few months...

Re: Storing no. of occurances of a token

2006-09-13 Thread Chris Hostetter
: I found out how to determine the number of documents in which a term : appeared by looking at the Luke code, but how does one determine the : number of times it occurs in each document? take a look at the TermDocs class. -Hoss

range query

2006-09-13 Thread Bhavin Pandya
Hi, I am using lucene from last few months...I have question about the range query performance... Is there any alternative of range query or can i fire a range query on a small set of documents so that it can be less expensive... - Bhavin pandya

55 Ways to Have Fun With Google

2006-09-13 Thread P. Alex. Salamanca R.
http://www.googlified.com/55fun.php Es muy entretenido, uno empieza a leer y no puede parar... la versión en pdf del libro la encuentran aquí: http://www.55fun.com/book.pdf -- Gracias por su atención. Cordial saludo, Alex. S.

removing a term from a lucene index

2006-09-13 Thread Miles Efron
This question surely shows how new I am to Lucene... but I'm interested in removing terms from a lucene index. In particular, I'd like to be able to delete all terms that appear in fewer than x documents (say x=3). This is in efforts to reduce the feature set for some research I'm doing. I

Re: IOException when calling hits.doc(int)

2006-09-13 Thread Andrzej Bialecki
Huinan wrote: Thanks, Ronnie. But why it works in some cases (when there is a small number of documents inside the index) ? The Hits class retrieves the first 50 results, and caches them. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ |

Re: Storing no. of occurances of a token

2006-09-13 Thread Bill Taylor
On Sep 13, 2006, at 3:39 AM, Paul Elschot wrote: On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote: Is it possible for me to store the number of occurances of a token in a particular document or a collection of documents? When the token is indexed as a term, an IndexReader pro

RE: Queries in Lucene

2006-09-13 Thread mcarcelen
Thank you very much. Yes, I´m very new to Lucene. I´m sorry With the help of Lucene we want to classify 724.827 legal files that in the first line contained the word "Auto" or "Providencia". We can to separate in two groups. That´s why I´ve indexed these files with Lucene before, and we thought t

SV: SV: Changing the Scoring api

2006-09-13 Thread Marcus Falck
Example: Enter query: AllText:Microsoft score: 0,01476238 2002-02-19 05:09:00(122578) Qwest pins recovery hopes on long-distance score: 0,01476227 2002-02-19 05:07:00(122547) Microsoft ordered to let states see Windows code Enter query: AllText:Microsoft OR AllText:IBM score: 0,02949772

SV: SV: Changing the Scoring api

2006-09-13 Thread Marcus Falck
It didn't really work for booleanqueries either. I thought it was working for some hours but to my big disappointment I realized that this was not the case. Im using two IndexReaders ( RAM and FS ) and one multireader. Creating one indexsearcher by passing the multireader as constructor argument

Re: Queries in Lucene

2006-09-13 Thread Erick Erickson
I'm assuming that you're new to Lucene, so if you're an old pro you probably already know all this I think you'll have difficulty here. Lucene has no concept of lines, just tokens and offsets. So here are a couple of suggestions off the top of my head... If the first line is the *only* way

Re: IOException when calling hits.doc(int)

2006-09-13 Thread Huinan
I agree. Thanks. On 9/13/06, Ronnie Kolehmainen <[EMAIL PROTECTED]> wrote: This might be related to filesystem, internal lucene buffering/caching, or practically anything that an implementor does not need to have knowledge of. The only thing that you, the implementor, *do* need to know is th

Re: IOException when calling hits.doc(int)

2006-09-13 Thread Ronnie Kolehmainen
This might be related to filesystem, internal lucene buffering/caching, or practically anything that an implementor does not need to have knowledge of. The only thing that you, the implementor, *do* need to know is that you should *not* access a Hits object after the searcher is closed ;) /R

Re: IOException when calling hits.doc(int)

2006-09-13 Thread Huinan
Thanks, Ronnie. But why it works in some cases (when there is a small number of documents inside the index) ? On 9/13/06, Ronnie Kolehmainen <[EMAIL PROTECTED]> wrote: Do not close the searcher until you are done with the Hits object. See the javadocs for Searchable.close() http://lucene.apa

Re: IOException when calling hits.doc(int)

2006-09-13 Thread Ronnie Kolehmainen
Do not close the searcher until you are done with the Hits object. See the javadocs for Searchable.close() http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Searchable.html#close() /Ronnie Huinan wrote: Hi, I'm having a weird problem: I created an index using IndexWriter. Then

RE: group field selection of the form field:(a b c)

2006-09-13 Thread Pramodh Shenoy
Thanks Doron/Erick, Option A did work and looks like I wasn't adding the fields to the Doc object correctly. I still used UN_TOKENIZED as these were plain strings that I wanted to do a full string comparison against. So basically the query +booktype:guides +content:management +(subtype:accoun

IOException when calling hits.doc(int)

2006-09-13 Thread Huinan
Hi, I'm having a weird problem: I created an index using IndexWriter. Then I had a piece of code which searches the index, then print out a particular field of the first document of the hits.(See the following code) As simple as that. Hits hits = IndexSearchUtil.getHits(defaultIndexLocat

Queries in Lucene

2006-09-13 Thread mcarcelen
Hi all, I´ve got a index and now I´m trying to create a query with lucene-2.0.0, I´d like to find files that in the first line get the following: AND Word2 I´m tried with the package org.apache.lucene.demo.SearchFiles but I get files where the word "Word2" is not in the first line. I don´t k

Re: Storing no. of occurances of a token

2006-09-13 Thread Paul Elschot
On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote: > > Is it possible for me to store the number of occurances of a token in a > particular document or a collection of documents? When the token is indexed as a term, an IndexReader provides access to the total number of documents conta

Storing no. of occurances of a token

2006-09-13 Thread Venkateshprasanna
Is it possible for me to store the number of occurances of a token in a particular document or a collection of documents? Regards, Venkateshprasanna -- View this message in context: http://www.nabble.com/Storing-no.-of-occurances-of-a-token-tf2263455.html#a6280422 Sent from the Lucene - Java Us