Re: How to retrieve number of documents based on a query ?

2008-06-25 Thread Daniel Noll
On Thursday 26 June 2008 15:09:44 java_is_everything wrote: > Hi all. > > Is there a way to obtain the number of documents in the Lucene index > (2.0.0), having a particular term indexed, much like what we do in a > database ? I suspect the normal way is a HitCollector which does nothing but incre

How to retrieve number of documents based on a query ?

2008-06-25 Thread java_is_everything
Hi all. Is there a way to obtain the number of documents in the Lucene index (2.0.0), having a particular term indexed, much like what we do in a database ? Looking forward to a reply. Ajay Garg -- View this message in context: http://www.nabble.com/How-to-retrieve-number-of-documents-based-o

IndexDeletionPolicy to delete commits after N minutes

2008-06-25 Thread Alex Cheng
hi, what is the correct way to instruct the indexwriter (or other classes?) to delete old commit points after N minutes ? I tried to write a customized IndexDeletionPolicy that uses the parameters to schedule future jobs to perform file deletion. However, I am only getting the filenames through the

IndexDeletionPolicy to delete after N minutes

2008-06-25 Thread Alex Cheng
hi, what is the correct way to instruct the indexwriter to delete old commit points after N minutes ? I tried to write a customized IndexDeletionPolicy that uses the parameters to schedule future jobs to do file deletion. However, I am only getting the filenames, and not absolute file names. thank

instruct IndexDeletionPolicy to delete old commits after N minutes

2008-06-25 Thread Alex Cheng
hi, what is the correct way to instruct the indexwriter to delete old commit points after N minutes ? I tried to write a customized IndexDeletionPolicy that uses the parameters to schedule future jobs to do file deletion. However, I am only getting the filenames, and not absolute file names. thank

Re: yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Chris Hostetter
: There the final solution suggestion from Hoss was to try it with a binary : search : on the TermEnum my suggestion at the time to do a binary search was a bit naive (i was not as familiar with Lucene as I am now). : Because of the tree-like architecture of the index, where the letters are som

Re: case insensitivity

2008-06-25 Thread Chris Hostetter
: I imangined (and maybe I am over simplifying it!) that somewhere in the API : there must be a string comparison using 'String.equals()' that determines if a : document contains the term or not - and that use of 'equals()' has permanently : locked Lucene into case-sensitive searching. The values

Re: Wildcard and Literal Searches combined

2008-06-25 Thread Chris Hostetter
: My users require wildcard searches. Sometimes their search phrases contain : spaces. I am having trouble trying to implement a wildcard search on strings : containing spaces, so if the term includes spaces I force a literal search : by adding double quotes to the search term. : So the search str

Re: Score 0

2008-06-25 Thread Yonik Seeley
On Wed, Jun 25, 2008 at 3:47 PM, Paolo Valleri <[EMAIL PROTECTED]> wrote: > For take docid of all document in the index I need to write a class > that implement indexReader or there is an other method ? MatchAllDocsQuery does it. -Yonik ---

Re: Score 0

2008-06-25 Thread Paolo Valleri
Thank for answer. For take docid of all document in the index I need to write a class that implement indexReader or there is an other method ? paolo 2008/6/25 Toke Eskildsen <[EMAIL PROTECTED]>: > On Wed, 2008-06-25 at 09:29 +0200, Paolo Valleri wrote: > > For several reasons I need also to kn

Re: Searching any part of a string

2008-06-25 Thread Erick Erickson
Warning: I don't understand ngrams at all, so you should read this as a plea for those who do to tell me I'm off base . But I wonder if indexing as n-grams would be a way to cope with this issue that lots of people have. *assuming* you are thinking about single terms, then it seems that "smith" w

Searching any part of a string

2008-06-25 Thread Mark Ferguson
Hello, I am currently keeping an index of all our client's usernames. The search functionality is implemented using a PrefixFilter. However, we would like to expand the functionality to be able to search any part of a user's name, rather than requiring that it begin with the query string. So for e

Searching any part of a string

2008-06-25 Thread Mark Ferguson
Hello, I am currently keeping an index of all our client's usernames. The search functionality is implemented using a PrefixFilter. However, we would like to expand the functionality to be able to search any part of a user's name, rather than requiring that it begin with the query string. So for e

Re: case insensitivity

2008-06-25 Thread Erick Erickson
I suppose something like that might work, but I still think that presenting a user with matches that sometimes work case sensitive and sometimes doesn't would be...er..fraught. If you can programmatically restrict your query construction and you're *sure* this is what your users expect, you can m

Re: yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Jason Rutherglen
I looked heavily at this. It requires a customization of TermInfosReader whereby the tii (term dictionary) SegmentTermEnum is traversed looking for the last term with a particular field. Once found, from that position in the tis SegmentTermEnum would need to be traversed again for the last term w

yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Christian Reuschling
Hello people, I'm sorry if I have send this message twice - my gmail interface merges the mails in the 'send' folder with incoming mails from my adress - strange, but I can't say if the mail was sent - I only see it in the send-folder (with only one label on it, which brings me to send it again

Re: Different indices for different searches?

2008-06-25 Thread Erick Erickson
It depends (tm, Erik Hatcher). How many docs in your index? How much information for each doc? What is the size of your index? You could have two different indexes. You could index the same data in different fields in the same index and only have one. There have been several discussions about this

Re: case insensitivity

2008-06-25 Thread John Byrne
What I had in mind was actually very simple: when you create a Term (programatically) you normally set the text and the field. I would also like to be able to set the case sensitivity to true or false for that specific Term object. I imangined (and maybe I am over simplifying it!) that somewhe

Different indices for different searches?

2008-06-25 Thread Sascha Fahl
Hi, I have 2 kind of searches. One kind is like the wikipedia suggestions and the other one is pretty classic. So does it make sense to have different indices for this 2 search-styles? best, sascha - To unsubscribe, e-mail

Re: case insensitivity

2008-06-25 Thread Erick Erickson
Well, it depends on what you mean by "per term". There's already PerFieldAnalyzerWrapper for each field, but I don't think that's what you want. How do you expect a per term analyzer to behave? I'm having a hard time thinking of a use case that's general. You could always roll your own analyzer th

Re: Problem with search an exact word and stemming

2008-06-25 Thread Erick Erickson
The way I've solved this is to index the stemmed *and* a special token at the same position (see Synonym Analyzer). The From your example, say you're indexing progresser. You'd go ahead and index the stemmed version , "progress", AND you'd also index "progresser$" at the same offset. Now, when you

Re: Concurrent query benchmarks, with 1,2,4,8 readers

2008-06-25 Thread Grant Ingersoll
Note, you can do all kinds of tests like this and others with the contrib/benchmark code built right into Lucene. -Grant On Jun 24, 2008, at 11:09 PM, Rakesh Shete wrote: Hi Glen, Is your source code available? I would like to have a look at it and check if whatever I have tried makes sen

Re: Indexing the spider content

2008-06-25 Thread Grant Ingersoll
If it has an API that let's you get the content that needs to be indexed, then, sure, you can index from the spider. If it doesn't have an API, presumably, you would need to somehow extract the docs from the files it builds. This is, of course, assuming it stores the crawled files in some

yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Christian Reuschling
Hello people, yes, there were several threads about this topic, but I sadly have to respawn it, I'm sorry. The first I found was a discussion from May 2005: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL PROTECTED] There the final solution suggestion from Hoss wa

case insensitivity

2008-06-25 Thread John Byrne
Hi, I know that case-insensitive searching is normally done by creating an all-lower-case version of the documents, and turning the search terms into lower case whenever this field is searched, but this approach has it's disadvantages. Let's say, for example, you want to find "Dell" (with a

Re: Indexing the spider content

2008-06-25 Thread yugana
We are using the VSpider... Yug John Wang wrote: > > Maybe building a Lucene gateway to hook in with VSpider. > Are you using VSpider or K2Spider? > > -John > > On Tue, Jun 24, 2008 at 8:35 PM, yugana <[EMAIL PROTECTED]> wrote: > >> >> Hi Otis, >> >> Thanks for the reply. So you mean it is

Re: Requesting MultipleIndeces

2008-06-25 Thread Konstantyn Smirnov
if you have a good hardware with tons of RAM, you can use ParallelMultiSearcher, which looks-up in all indieces simulateneously. if you are short on that, you must search in one index at a time, using MultiSearcher. -- View this message in context: http://www.nabble.com/Requesting-MultipleIndec

Re: Score 0

2008-06-25 Thread Toke Eskildsen
On Wed, 2008-06-25 at 09:29 +0200, Paolo Valleri wrote: > For several reasons I need also to know the documents that don't match the > input query. For example with score 0. Make a list of the docid for all the non-deleted documents in the index. Collect the docids from the search-result. Subtract

Problem with search an exact word and stemming

2008-06-25 Thread renou oki
Hello, I have a stemmed index, but i want to search the exact form of a word. I use French Analyzer, so for instance "progression", "progresser" are indexed with the linguistic root "progress". But if I want to search the word "progress" (and only this word), I have to many hits (because of "progr

Problem with search an exact word and stemming...

2008-06-25 Thread renou oki
Hello, I have a stemmed index, but i want to search the exact form of a word. I use French Analyzer, so for instance "progression", "progresser" are indexed with the linguistic root "progress". But if I want to search the word "progress" (and only this word), I have to many hits (because of "progr

Score 0

2008-06-25 Thread Paolo Valleri
Hi, I'm using lucene to compute the score of some documents. For several reasons I need also to know the documents that don't match the input query. For example with score 0. I don't know the engine of lucene and I was wondering how difficult this change would be. Thanks. -- Paolo Valleri