Hi all,
As far as I know, I don't find any Lucene API for updating an index document.
What I have to do is to delete the existing index document and insert a new
one. However, this is going to be 2 separate operations (delete and update). If
the first operation suceeds while the second operatio
You mentioned before that you can't "batch" your updates ... i can
understand not being able to batch updates by number of updates -- but why
can't you batch by time?
It may sound bad to only process updates once an hour, or once every half
hour, or once every 5 minutes, or even once every 30 sec
: one is sorting on doesn't even have to exist in all the documents. I
: think it would be even more confusing for an invalid query suddenly
: becoming a valid query in the future just because someone added a doc
Or worse, a query that does work today, stops working tomorow because one
doc was r
All,
I've just released Zilverline version 1.2.0.
This version is fully webbased, all settings, collections, preferences
can be set via the web interface. You don't need to edit any config
files anymore. Also I'm adding Powerpoint and Excel Extractors.
The source will be made available as well ve
Also, it's more flexible. You can easily implement stricter checking
on top of a "lax" model (use a term enumerator to see if the field
exists before you call search), but not vice versa.
-Yonik
On 4/14/05, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Hmmm, that's a great lucene architecture questi
Hmmm, that's a great lucene architecture question.
Should one be allowed to sort on a field that doesn't exist?
One *can* query on fields that don't exist (and that's correct in my view).
The thing is, lucene field creation is lazy... just because the field
doesn't exist now doesn't mean that it w
On Thursday 14 April 2005 16:28, Yonik Seeley wrote:
> I haven't tried it, but I think the fix should be easy... never throw
> that exception.
As Lucene does not have the concept of a "warning" I think it should throw
exceptions when someone tries to do something that doesn't make sense
(even i
On Apr 14, 2005, at 4:32 PM, Martin May wrote:
I've got the book (which is great, btw). I used Luke to get
explanations
of the results, but I don't see any boosts in the explanations.
The index-time boosts are folded into the field normalization factor,
so you won't see boost by itself. That fie
I've got the book (which is great, btw). I used Luke to get explanations
of the results, but I don't see any boosts in the explanations.
Martin
On Thu, 2005-04-14 at 13:24 -0700, Otis Gospodnetic wrote:
> I'd look a the output of Explain to see how ranking score is calculated
>
> Look at this:
I'd look a the output of Explain to see how ranking score is calculated
Look at this: http://lucenebook.com/search?query=explain (hit #1 is
from a free chapter)
Otis
--- Martin May <[EMAIL PROTECTED]> wrote:
>
> I have a bunch of documents in my index, some of which have values
> for a
> certa
I have a bunch of documents in my index, some of which have values for a
certain field while others don't. I'd like the ones that do have a value
to always show up before the ones who don't when sorting by relevance.
I tried to accomplish this by check whether there are values for the
field, and
On Thursday 14 April 2005 16:44, Luis Medina wrote:
> primarily reporting lock issues (except no lock files
> were found in the directory).
With "that directory", do you mean the index directory? The lock files are
not there, but in /tmp (by default). It's only okay to remove the lock
file manu
Roy Klein wrote:
I think this is a better way of asking my original questions:
"Why was this designed this way?"
In order to optimize updates.
"Can it be changed to optimize updates?"
Updates are fastest when additions and deletions are separately batched.
That is the design.
Doug
-
Hi,
I guess I didn't ask my question very well. I do understand that you can
only do a delete via a reader based on the current sources, what I don't
understand is why the delete function couldn't be incorporated into a
writer, so that updates could be all done within the context of a writer?
Fo
Hi,
Erik Hatcher a écrit :
No, this hasn't been done except for the basic Query.toString() output
which for the most part is parsable again.
The question is, what do you do about the analysis process? It's a
one-way transformation - and parsing again may not yield the same query.
We (the SDX de
Yonik Seeley wrote:
There are times, however, when it would be nice for
deletes to be able to be concurrent with adds.
It would also be nice if good coffee was free.
Q: can docids change after an add() (with merging segments going on
behind the scenes) or is optimize() the only call that ends up
ch
Paul Libbrecht wrote:
I am currently evaluating the need for an elaborate query data-structure
(to be exchanged over XML-RPC) as opposed to working with plain strings.
I'd opt for both. For example:
"java based" -coffee
site
apache.org
d
> An IndexReader is required to, given a term, find the document number to
> mark deleted.
Yeah, most the time it makes sense to do deletions off the
IndexReader. There are times, however, when it would be nice for
deletes to be able to be concurrent with adds.
Q: can docids change after an add(
On Apr 14, 2005, at 11:32 AM, Paul Libbrecht wrote:
Hi,
I am currently evaluating the need for an elaborate query
data-structure (to be exchanged over XML-RPC) as opposed to working
with plain strings.
One thing that would heavily vote for strings would be to have query
objects returne
Roy Klein wrote:
So one thing I've been wondering: Why do you need to do deletes from an
indexreader?
Is this not in the FAQ? It should be...
IndexWriter can only append documents to an index.
An IndexReader is required to, given a term, find the document number to
mark deleted.
Also, in the cu
Paul Smith wrote:
So it sounds like there isn't a perfect solution, but I think the best
tradeoff for me is to put them all in the same position unless
anyone has more input on the subject?
If they're all at the same position you can still use slop to match the
phrase. So if 'power', 'query'
On Thursday 14 Apr 2005 15:15, Pablo Gomes Ludermir wrote:
> Hello all,
>
> I would like to get the following information from the index:
>
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , , , ...
>
> Is possible to do that?
>
>
Hi,
I am currently evaluating the need for an elaborate query
data-structure (to be exchanged over XML-RPC) as opposed to working
with plain strings.
One thing that would heavily vote for strings would be to have query
objects returned by Query-parser reconvertible to a string (and bac
Le 14 avr. 05, à 17:15, Pablo Gomes Ludermir a écrit :
I would like to get the following information from the index:
1. Given a term, how many times the term occurs in each document.
Something like a triple:
< Term, Doc1, Freq> , , , ...
Is possible to do that?
Luke did this to my index with good s
Hi,
> From: Pablo Gomes Ludermir [mailto:[EMAIL PROTECTED]
> I would like to get the following information from the index:
>
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , , , ...
>
> Is possible to do that?
See IndexRead
> if (termEnum==null || term.field() != field) break; // CHANGE
> here
Errr, that should be term==null of course.
> if (term==null || term.field() != field) break; // CHANGE here
And it *may* be slightly speedier to check for null just before the
do/while loop instead:
Hello all,
I would like to get the following information from the index:
1. Given a term, how many times the term occurs in each document.
Something like a triple:
< Term, Doc1, Freq> , , , ...
Is possible to do that?
Regards,
Pablo
--
Pablo Gomes Ludermir
[EMAIL PROTECTED]
-Oorspronkelijk bericht-
Van: Roy Klein [mailto:[EMAIL PROTECTED]
Verzonden: donderdag 14 april 2005 15:40
Aan: java-user@lucene.apache.org
Onderwerp: Update performance/indexwriter.delete()?
>>I've got an application that will be doing
>>constant updates to an index.
>>I've looked i
Hi Everyone,
The company I work for uses Lucene search 2 of their sites. Each site's
configuration is (almost) an mirror image of the other. The only difference
here is the content. We use a servlet to start up a Lucene mantainance
utility that keeps the indexes up to date. This servlet is set to
I haven't tried it, but I think the fix should be easy... never throw
that exception. Either check for null before the loop, or in the
loop.
Original code for native int sorting:
TermEnum termEnum = reader.terms (new Term (field, ""));
try {
if (termEnum.term() == null)
I've got an application that will be doing constant updates to an index.
I've looked into batching those updates, however, based on the way the
application works, the updates can't be batched. (Well, I figure with a lot
of work, I might be able to batch ~10% of the transactions) Another
requiremen
Maher Martin wrote:
* The user's access rights would be read from Active Directory (i.e
windows group membership, etc)
* On the submission of a query to Lucene - the user / group access
rights would be appended as required search criteria and Lucene would
filter out all results that the user should
Hi Eric,
I haven't tested it personally, but I have had reports
that it works OK with CJKAnalyzer. This was reported
after I added support for overlapping tokens in
tokenstreams last July.
Cheers,
Mark
--- Eric Chow <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Is any any good Highlighter for Asian
Here is a demo:
http://grassland.cnblog.org/
Che Dong
Eric Chow åé:
Hello,
Is any any good Highlighter for Asian languages (Chinese, Japanese, Koreanese)
Eric
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-m
Hello,
Is any any good Highlighter for Asian languages (Chinese, Japanese, Koreanese)
Eric
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
35 matches
Mail list logo