Otis Gospodnetic writes:
>
> The link to list archives should be on lucene.apache.org.
>
It should, but the link there does not work.
All you get is 'Error occurred
Required parameter "listId" or "listName" is missing or invalid'
from mail-archives.apache.org
Something seems to be broken.
So t
On Apr 20, 2005, at 04:09, Wesley MacDonald wrote:
UID consists of a unique number based on a hashcode, system time and a
counter, and a VMID contains a UID and adds a SHA hash based on IP
address.
Hmmm... UUID?
http://en.wikipedia.org/wiki/UUID
http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUI
Hi Otis,
Thanks for your answer on the integer issue. I was not
sure if the index was actually limited, or if it was
just the numDocs method call. I guess it really does
not matter which it is; and for me, I don't think my
index will ever get that large! I do have a couple of
questions from you
Ð ÑÐÐÐÑ ÐÑ ÐÑÐÐÐ 20 ÐÐÑÐÐÑ 2005 04:07 Mufaddal
Khumri ÑÐÐ(a):
> The 2 products I mentioned are 2 rows. I get the products in
> bulk by using a limit clause.
>
> I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine.
Maybe your session-level cache in hibernate grow
Hello,
Yes, there is a limit, but it's pretty high:
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Integer.html#MAX_VALUE
Iterating the index like that is ok, but each call do
reader.document(int) pulls the entire Document off the disk, which can
get expensive.
The link to list archives shoul
Hi,
I posted this in the past:
Java has GUID's classes called java.rmi.server.UID and
java.rmi.dgc.VMID.
The UID class can generate identifiers that are unique over time within
a JVM. The VMID class provides uniqueness across ALL JVM's.
UID consists of a unique number based on a hashcode, syste
Mike Baranczak wrote:
First of all, a big thanks to all the Lucene hackers - I've only been
using your product for a couple of weeks, and I've been very impressed
by what I've seen.
Here's my question: I have an index with a little over 3 million
documents in it, with more on the way. Each docu
First of all, a big thanks to all the Lucene hackers - I've only been
using your product for a couple of weeks, and I've been very impressed
by what I've seen.
Here's my question: I have an index with a little over 3 million
documents in it, with more on the way. Each document has an "URL" fiel
Le 19 avr. 05, à 22:50, Erik Hatcher a écrit :
The only catch that I know if is that an Analyzer is invoked on a
per-field basis. I can't tell exactly what you have in mind, but a
Lucene Analyzer cannot split data into separate fields itself - it has
to have been split prior.
That's an easy one
Muffadal,
First, you should add some timing code to determine whether your
database is slow, or your indexing (I think tokenization occurs in the
call to writer.addDocument()). Assuming your database query is the
slowdown, read on...
Depending on the details of your database (which fields are in
Daniel Herlitz wrote:
I would suggest you simply do not create unusable indexes. :-) Handle
catch/throw/finally correctly and it should not present any problems.
In some use scenarios it's not that simple... Anyway, back to the
original question: indexExists() just checks for the presence of the
Hello Everyone,
I need to be able to iterate through the entire set of
documents within the index to perform some auditing. I
originally tried the following code snip:
int ndoc = idxReader.numDocs();
for (int i=0; i< ndoc; i++) {
Document doc = idxReader.document(i);
.
.
.
}
T
Hi,
The 2 products I mentioned are 2 rows. I get the products in
bulk by using a limit clause.
I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine.
I am baffled that 1.2 or 1.5 million records are being indexed in 20
minutes compared to the 2 records I am indexin
On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote:
Hi,
I am working on an index to search XML data in a fixed format that I
master well...
The idea is that the XML content (which I have as JDOM object)
actually carries the semantic which would be best converted directly
into tokens by something
Hm well, I unconsciously extrapolated natural language into java syntax.
Just in case, should be:
try: Build the index in a separate catalogue.
if all ok: remove ('rm') production index and move ('mv') newly built
index to its place. Notify using app that it should reopen its IndexReader.
/D
I
I would suggest you simply do not create unusable indexes. :-) Handle
catch/throw/finally correctly and it should not present any problems.
Assume one app builds the index, another uses it:
try: Build the index in a separate catalogue.
finally: remove ('rm') production index and move ('mv') newl
The only time I have seen corrupted indexes is when the java process is
killed during the indexing process.
If you shutdown tomcat (or what ever you are running for java) during the
indexing process you will end up with a corrupted index.
- Original Message -
From: "Andy Roberts" <[EMAIL
Hi,
Seems like an odd request I'm sure. However, my application relies an index,
and should the index become unusable for some unfortunate reason, I'd like my
app to gracefully cope with this situation.
Firstly, I need to know how to detect a broken index. Opening an IndexReader
can potentiall
Hi,
I am working on an index to search XML data in a fixed format that I
master well...
The idea is that the XML content (which I have as JDOM object) actually
carries the semantic which would be best converted directly into tokens
by something like an analyzer. However, adding fields is done no
Hi Volodymyr,
About the trick you described about wildcard search
replacement, you mentioned:
> So I found following workaround. I index this field
as > sequence of terms, each of containing single
digit from > needed value. (For example I have
123214213 value
> that needs to be indexed. Then i
Agree. We run an index with about 2.5 million documents and around 30
fields. The indexing itself of 2 items should only take a few
seconds on a reasonably fast machine.
/D
Kevin L. Cobb wrote:
I think your bottleneck is most likely the DB hit. I assume by 2
products you mean 2 disti
I think your bottleneck is most likely the DB hit. I assume by 2
products you mean 2 distinct entries into the Lucene Index, i.e.
2 rows in the DB to select from.
I index about 1.5 million rows from a SQL Server 2000 database with
several fields for each entry and it finishes in about
Hi,
I am sure this question must be raised before and maybe it has been even
answered. I would be grateful, if someone could point me in the right
direction or give their thoughts on this topic.
The problem:
I have approximately over 2 products that I need to index. At the
moment I get X num
Hi,
I need some clarification on the indexing process.
A process is initiated for indexing 1000 documents. If for some reason, the
process fails mid-way during the indexing activity, say while indexing the
501st document, what is the status of the index files? Does it commit after
each docu
On Apr 19, 2005, at 13:37, Eric Chow wrote:
Is there any RTF text extractor for Lucene ?
import javax.swing.text.Document;
import javax.swing.text.rtf.RTFEditorKit;
RTFEditorKitaKit = new RTFEditorKit();
DocumentaDocument = aKit.createDefaultDocument();
aKit.read( anInputStream, aDocume
On Apr 19, 2005, at 7:37 AM, Eric Chow wrote:
Hello,
Is there any RTF text extractor for Lucene ?
You can use some Swing classes to do this. This is from the Lucene in
Action code (http://www.lucenebook.com/search?query=rtf)
public Document getDocument(InputStream is)
throws DocumentHandle
Hello,
Is there any RTF text extractor for Lucene ?
Eric
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi,
> From: Eric Chow [mailto:[EMAIL PROTECTED]
>
> I mean if I use wildcard query, it cannot highlight any terms ?
>
> Any idea to do this or any existing example ?
Try to rewrite query before highlighting.
Pasha Bizhan
Use query.rewrite() to expand the query before calling
the highlighter. See the Junit test or javadocs for
the QueryTermExtractor class.
--- Eric Chow <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I downloaded the term highlighting in sandbox.
> But it seems not support wildcard searching.
>
> I mean
Hello,
I downloaded the term highlighting in sandbox.
But it seems not support wildcard searching.
I mean if I use wildcard query, it cannot highlight any terms ?
Any idea to do this or any existing example ?
Best regards,
Eric
--
30 matches
Mail list logo