On Thursday 26 June 2008 15:09:44 java_is_everything wrote:
> Hi all.
>
> Is there a way to obtain the number of documents in the Lucene index
> (2.0.0), having a particular term indexed, much like what we do in a
> database ?
I suspect the normal way is a HitCollector which does nothing but incre
Hi all.
Is there a way to obtain the number of documents in the Lucene index
(2.0.0), having a particular term indexed, much like what we do in a
database ?
Looking forward to a reply.
Ajay Garg
--
View this message in context:
http://www.nabble.com/How-to-retrieve-number-of-documents-based-o
hi,
what is the correct way to instruct the indexwriter (or other
classes?) to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to perform file deletion. However, I am only getting the
filenames through the
hi,
what is the correct way to instruct the indexwriter to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to do file deletion. However, I am only getting the filenames,
and not absolute file names.
thank
hi,
what is the correct way to instruct the indexwriter to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to do file deletion. However, I am only getting the filenames,
and not absolute file names.
thank
: There the final solution suggestion from Hoss was to try it with a binary
: search
: on the TermEnum
my suggestion at the time to do a binary search was a bit naive (i was
not as familiar with Lucene as I am now).
: Because of the tree-like architecture of the index, where the letters are som
: I imangined (and maybe I am over simplifying it!) that somewhere in the API
: there must be a string comparison using 'String.equals()' that determines if a
: document contains the term or not - and that use of 'equals()' has permanently
: locked Lucene into case-sensitive searching. The values
: My users require wildcard searches. Sometimes their search phrases contain
: spaces. I am having trouble trying to implement a wildcard search on strings
: containing spaces, so if the term includes spaces I force a literal search
: by adding double quotes to the search term.
: So the search str
On Wed, Jun 25, 2008 at 3:47 PM, Paolo Valleri <[EMAIL PROTECTED]> wrote:
> For take docid of all document in the index I need to write a class
> that implement indexReader or there is an other method ?
MatchAllDocsQuery does it.
-Yonik
---
Thank for answer.
For take docid of all document in the index I need to write a class
that implement indexReader or there is an other method ?
paolo
2008/6/25 Toke Eskildsen <[EMAIL PROTECTED]>:
> On Wed, 2008-06-25 at 09:29 +0200, Paolo Valleri wrote:
> > For several reasons I need also to kn
Warning: I don't understand ngrams at all, so you should
read this as a plea for those who do to tell me I'm off base .
But I wonder if indexing as n-grams would be a way to
cope with this issue that lots of people have. *assuming*
you are thinking about single terms, then it seems that
"smith" w
Hello,
I am currently keeping an index of all our client's usernames. The search
functionality is implemented using a PrefixFilter. However, we would like to
expand the functionality to be able to search any part of a user's name,
rather than requiring that it begin with the query string. So for e
Hello,
I am currently keeping an index of all our client's usernames. The search
functionality is implemented using a PrefixFilter. However, we would like to
expand the functionality to be able to search any part of a user's name,
rather than requiring that it begin with the query string. So for e
I suppose something like that might work, but I still think that presenting
a user with matches that sometimes work case sensitive and sometimes
doesn't would be...er..fraught.
If you can programmatically restrict your query construction and you're
*sure* this is what your users expect, you can m
I looked heavily at this. It requires a customization of TermInfosReader
whereby the tii (term dictionary) SegmentTermEnum is traversed looking for
the last term with a particular field. Once found, from that position in
the tis SegmentTermEnum would need to be traversed again for the last term
w
Hello people,
I'm sorry if I have send this message twice - my gmail interface merges the
mails in the 'send' folder with incoming mails from my adress - strange, but
I can't say if the mail was sent - I only see it in the send-folder (with
only one label on it, which brings me to send it again
It depends (tm, Erik Hatcher). How many docs in your index?
How much information for each doc? What is the size of
your index?
You could have two different indexes. You could index the same
data in different fields in the same index and only have one. There
have been several discussions about this
What I had in mind was actually very simple: when you create a Term
(programatically) you normally set the text and the field. I would also
like to be able to set the case sensitivity to true or false for that
specific Term object.
I imangined (and maybe I am over simplifying it!) that somewhe
Hi,
I have 2 kind of searches. One kind is like the wikipedia suggestions
and the other one is pretty classic. So does it make sense to have
different indices for this 2 search-styles?
best,
sascha
-
To unsubscribe, e-mail
Well, it depends on what you mean by "per term". There's already
PerFieldAnalyzerWrapper for each field, but I don't think that's what
you want.
How do you expect a per term analyzer to behave? I'm having a hard
time thinking of a use case that's general. You could always
roll your own analyzer th
The way I've solved this is to index the stemmed *and* a special
token at the same position (see Synonym Analyzer). The From your
example, say you're indexing progresser. You'd go ahead and index the
stemmed version , "progress", AND you'd also index "progresser$"
at the same offset. Now, when you
Note, you can do all kinds of tests like this and others with the
contrib/benchmark code built right into Lucene.
-Grant
On Jun 24, 2008, at 11:09 PM, Rakesh Shete wrote:
Hi Glen,
Is your source code available? I would like to have a look at it and
check if whatever I have tried makes sen
If it has an API that let's you get the content that needs to be
indexed, then, sure, you can index from the spider. If it doesn't
have an API, presumably, you would need to somehow extract the docs
from the files it builds. This is, of course, assuming it stores the
crawled files in some
Hello people,
yes, there were several threads about this topic, but I sadly have to respawn
it, I'm sorry.
The first I found was a discussion from May 2005:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL
PROTECTED]
There the final solution suggestion from Hoss wa
Hi,
I know that case-insensitive searching is normally done by creating an
all-lower-case version of the documents, and turning the search terms
into lower case whenever this field is searched, but this approach has
it's disadvantages.
Let's say, for example, you want to find "Dell" (with a
We are using the VSpider...
Yug
John Wang wrote:
>
> Maybe building a Lucene gateway to hook in with VSpider.
> Are you using VSpider or K2Spider?
>
> -John
>
> On Tue, Jun 24, 2008 at 8:35 PM, yugana <[EMAIL PROTECTED]> wrote:
>
>>
>> Hi Otis,
>>
>> Thanks for the reply. So you mean it is
if you have a good hardware with tons of RAM, you can use
ParallelMultiSearcher, which looks-up in all indieces simulateneously.
if you are short on that, you must search in one index at a time, using
MultiSearcher.
--
View this message in context:
http://www.nabble.com/Requesting-MultipleIndec
On Wed, 2008-06-25 at 09:29 +0200, Paolo Valleri wrote:
> For several reasons I need also to know the documents that don't match the
> input query. For example with score 0.
Make a list of the docid for all the non-deleted documents in the index.
Collect the docids from the search-result. Subtract
Hello,
I have a stemmed index, but i want to search the exact form of a word.
I use French Analyzer, so for instance "progression", "progresser" are
indexed with the linguistic root "progress".
But if I want to search the word "progress" (and only this word), I have to
many hits (because of "progr
Hello,
I have a stemmed index, but i want to search the exact form of a word.
I use French Analyzer, so for instance "progression", "progresser" are
indexed with the linguistic root "progress".
But if I want to search the word "progress" (and only this word), I have to
many hits (because of "progr
Hi, I'm using lucene to compute the score of some documents.
For several reasons I need also to know the documents that don't match the
input
query. For example with score 0.
I don't know the engine of lucene and I was wondering how difficult this
change would be.
Thanks.
--
Paolo Valleri
31 matches
Mail list logo