Daniel Noll <[EMAIL PROTECTED]> wrote on 01/03/2007 22:10:15:
> > API IndexWriter.updateDocument() may be useful.
>
> Whoa, nice convenience method.
>
> I don't suppose the new document happens to be given the same ID as the
> old one. That would make many people's lives much easier. :-)
Oh no,
Doron Cohen wrote:
Once indexing the database_id field this way, also the newly added
API IndexWriter.updateDocument() may be useful.
Whoa, nice convenience method.
I don't suppose the new document happens to be given the same ID as the
old one. That would make many people's lives much easie
The odds increase significantly in correlation to patches
submitted! :-) The odds increase slightly by at least filing an
"enhancement" issue in JIRA. They increase a tiny bit by bringing it
up here! I may have some time in the not too distant future for
this, but we always appreciate t
What are the odds (or reasons against) bubbling up doc(int,
fieldSeclector) to Searcher? I would love to take advantage of the
selective field loading but I am working with MultiSearchers and
Searchers so I cannot count on getReader (in IndexSearcher) for access.
- Mark
--
If all you want to do is find docs containing dates within a range, it
probably doesn't make much difference whether you give dates their own
field or put them into your content field. It'll probably be easier to
just add them into the token stream since that's the way the analyzer
architecture wan
I'm still having issues with long running queries.
I'm using a custom HitCollector to bring back ALL docs that match a search
has suggested in a previous post/relpy (e.g. Nutch LuceneQueryOptimizer).
This solution works most of the time; however, in testing a very complex
query using several ran
On Mar 1, 2007, at 1:35 PM, Neal Richter wrote:
Collex is quite open source, its just ugly source :) We're the
'patacriticism' project at SourceForge, under the "collex" directory
in Subversion.
Collex implements tagging by implementing JOIN cross-references
between user/tag documents and regu
On 3/1/07, Saravana <[EMAIL PROTECTED]> wrote:
Is this still hold good now ? Thanks for your reply.
Probably most of that still applies to some extent. However, it is
unclear whether it will speed up your application.
First thing is to find out what your bottleneck is. Looking at the
stats
Collex is quite open source, its just ugly source :) We're the
'patacriticism' project at SourceForge, under the "collex" directory
in Subversion.
Collex implements tagging by implementing JOIN cross-references
between user/tag documents and regular object documents. It's
scalability is not goi
Erik Hatcher wrote:
I'm pretty sure this has been done, I'm just not 100% sure where. Does
Nutch index link text?
Nutch does do this sort of thing, but I'm not quite sure how. It
isn't doing any operations to the Lucene index beyond what plain ol'
Lucene does.
Nutch maintains a set of s
On Feb 28, 2007, at 8:59 AM, Steven Parkes wrote:
Are unindexed fields stored seperately from the main inverted
index?
If so then, one could implement the field value change as a
delete and
re-add of just that value?
The short answer is that won't work. Field values are
Thank you all for the suggestions steering me down the right path.
As an aside, the easy part, at least for me, is extracting the dates
-- Peter was dead on about how doing that: heuristics, multiple
regular expressions, and data structures. As Steve pointed out, this
isn't as trivial as it soun
I can't speak to where you can get a copy of the original code, but the
modified code I have is not GPL licenced - the license header in at
least one file is as follows:
/* Copyright 2004 Ryan Ackley
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this f
On Feb 23, 2007, at 2:00 PM, [EMAIL PROTECTED]
wrote:
Re: TextMining.org Word extractor
Someone noted that textmining.org gets hacked. There is test-
mining.org which appears to be a commercial site. Can someone tell
me where to get the download of the original GPL textmining.org
so
Hi,
You need just the counts? And you want to do just whole-field matching, not
word matching? In that case, Lucene might be an overkill for you. Or, if you
do use Lucene, make sure to use "keyword" (untokenized) fields, not
"tokenized" fields.
Sorry for not elaborating my requirement more. Actu
Hello-
One of the fields in my index is an ID, which maps to a full text
description behind the scenes. Now I want to sort the search results
alphabetically according to the description, not the ID. This can be
done via SortComparatorSource and a ScoreDocComparator without
problems. But t
Yes, it will affect the search performance because you need to merge the
results from the different indexes. The best performance is from a
single index. The more indexes you have the more time it takes to
search.
Aviran
http://www.aviransplace.com
-Original Message-
From: Raaj [mailto:[
Hi!
My problem is to retrieve the term positions in a "general" query with more
than one terms.
It seems that with the phrase query it's possible (with SpanQuery) but with
"AND" and "OR" query I can't get the position for each document I search.
I'm looking for a high level implementation because
Sachin,
A lof of the questions you are asking are covered either in the FAQ or on the
Lucene site somewhere, or in various Lucene articles or in LIA. You should
check those places first (the traffic on java-user is already high!), you'll
save yourself a lot of time. For this particular questio
Erick,
I think you're right because you'd wouldn't know the max score before the
comparisons. I'm just thinking about a rounding algorithm that involves
comparing the raw scores to the theoretical maximum score, which I think
could be computed from the Similarity class and knowing the max boost v
yeah I am too looking forward to this feature, using thread pool and
minimize the remote calls in ParallelSearcher
[EMAIL PROTECTED] wrote:
e.g. I've changed original ParallelSearcher to use thread pool
(java.util.concurrent.ThreadPoolExecutor from jdk 1.5).
But implementing multi-host insta
Peter:
About a custom ScoreComparator. The problem I couldn't get past was that I
needed to know the max score of all the docs in order to divide the raw
scores into quintiles since I was dealing with raw scores. I didn't see how
to make that work with ScoreComparator, but I confess that I didn't
Hi all,
Is it possible in Lucene for an index to span multiple files? If so
what is the recommendation in this case? Is it better to span after the
index reaches a particular size? Furthermore, does Lucene ever span a
single record between two or more index files in this case or does it
ensure
GATE is the other entity extraction framework ( http://gate.ac.uk) and comes
out of the box with a lot of this stuff.
Even once you've parsed the dates your next problem is representing and
querying time - you referred to the fact that documents could represent single
dates, multiple dates or t
"DECAFFMEYER MATHIEU" <[EMAIL PROTECTED]> wrote:
> I deleted the lock file, now it seems to work ...
>
> When can such an error happen ?
See my response I just sent to java-user on this same error. Even though
you are running Lucene 2.0, the same causes can lead to that "Lock obtain
timed out"
Ah, I once worked in a place where we did exactly that - recognition and
extraction of useful nuggets from emails - dates, emails, URLs, attachments,
people, places...see divmod.com for the next generation of that. I believe Zoe
subsequently did something very similar. I think Zoe is still fre
"Jerome Chauvin" <[EMAIL PROTECTED]> wrote:
> We encounter issues while updating the lucene index, here is the stack
> trace:
>
> Caused by: java.io.IOException: Lock obtain timed out:
> SimpleFSLock@/data/www/orcanta/lucene/store1/write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:6
If you decide to cache stored field value in memory, FieldCache may be
useful for this - so you don't have to implement your own cache - you can
access the field values with something like:
FieldCache fieldCache = FieldCache.DEFAULT;
String db_id_field[] =
fieldCache.getStrings(indexReader,"
I deleted the lock file, now it seems to work ...
When can such an error happen ?
__
Matt
From: DECAFFMEYER MATHIEU [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 01, 2007 9:56 AM
To: java-user@lucene.apache.org
Subjec
Hi,
While updating my index I have the following error :
[3/1/07 9:44:19:214 CET] 76414c82 SystemErr R java.io.IOException:
Lock obtain timed out:
[EMAIL PROTECTED]:\TEMP\lucene-b56f455aea0a705baecaa4411d590aa2-write.lock
[3/1/07 9:44:19:214 CET] 76414c82 SystemErr R at
org.apache.l
On Tue, Feb 27, 2007, Saravana wrote about "indexing performance":
> Hi,
>
> Is it possible to scale lucene indexing like 2000/3000 documents per
> second?
I don't know about the actual numbers, but one trick I've used in the past
to get really fast indexing was to create several independent inde
e.g. I've changed original ParallelSearcher to use thread pool
(java.util.concurrent.ThreadPoolExecutor from jdk 1.5).
But implementing multi-host installation still requires a lot of changes
since ParallelSearcher calles underlying Searchables too many times (e.g.
for separate network call for ev
All,
We encounter issues while updating the lucene index, here is the stack trace:
Caused by: java.io.IOException: Lock obtain timed out:
SimpleFSLock@/data/www/orcanta/lucene/store1/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:69)
at org.apache.lucene.index.IndexReader.aquir
33 matches
Mail list logo