Re: numDocs method of IndexReader

2005-04-19 Thread Morus Walter
Otis Gospodnetic writes: > > The link to list archives should be on lucene.apache.org. > It should, but the link there does not work. All you get is 'Error occurred Required parameter "listId" or "listName" is missing or invalid' from mail-archives.apache.org Something seems to be broken. So t

Re: globally unique field

2005-04-19 Thread PA
On Apr 20, 2005, at 04:09, Wesley MacDonald wrote: UID consists of a unique number based on a hashcode, system time and a counter, and a VMID contains a UID and adds a SHA hash based on IP address. Hmmm... UUID? http://en.wikipedia.org/wiki/UUID http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUI

Re: numDocs method of IndexReader

2005-04-19 Thread Tomcat Programmer
Hi Otis, Thanks for your answer on the integer issue. I was not sure if the index was actually limited, or if it was just the numDocs method call. I guess it really does not matter which it is; and for me, I don't think my index will ever get that large! I do have a couple of questions from you

Re: Lucene bulk indexing

2005-04-19 Thread skoptelov
Ð ÑÐÐÐÑ ÐÑ ÐÑÐÐÐ 20 ÐÐÑÐÐÑ 2005 04:07 Mufaddal Khumri ÑÐÐ(a): > The 2 products I mentioned are 2 rows. I get the products in > bulk by using a limit clause. > > I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine. Maybe your session-level cache in hibernate grow

Re: numDocs method of IndexReader

2005-04-19 Thread Otis Gospodnetic
Hello, Yes, there is a limit, but it's pretty high: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Integer.html#MAX_VALUE Iterating the index like that is ok, but each call do reader.document(int) pulls the entire Document off the disk, which can get expensive. The link to list archives shoul

RE: globally unique field

2005-04-19 Thread Wesley MacDonald
Hi, I posted this in the past: Java has GUID's classes called java.rmi.server.UID and java.rmi.dgc.VMID. The UID class can generate identifiers that are unique over time within a JVM. The VMID class provides uniqueness across ALL JVM's. UID consists of a unique number based on a hashcode, syste

Re: globally unique field

2005-04-19 Thread Chuck Williams
Mike Baranczak wrote: First of all, a big thanks to all the Lucene hackers - I've only been using your product for a couple of weeks, and I've been very impressed by what I've seen. Here's my question: I have an index with a little over 3 million documents in it, with more on the way. Each docu

globally unique field

2005-04-19 Thread Mike Baranczak
First of all, a big thanks to all the Lucene hackers - I've only been using your product for a couple of weeks, and I've been very impressed by what I've seen. Here's my question: I have an index with a little over 3 million documents in it, with more on the way. Each document has an "URL" fiel

Re: Passing XML objects to the analyzer ?

2005-04-19 Thread Paul Libbrecht
Le 19 avr. 05, à 22:50, Erik Hatcher a écrit : The only catch that I know if is that an Analyzer is invoked on a per-field basis. I can't tell exactly what you have in mind, but a Lucene Analyzer cannot split data into separate fields itself - it has to have been split prior. That's an easy one

Re: Lucene bulk indexing

2005-04-19 Thread Chris Lamprecht
Muffadal, First, you should add some timing code to determine whether your database is slow, or your indexing (I think tokenization occurs in the call to writer.addDocument()). Assuming your database query is the slowdown, read on... Depending on the details of your database (which fields are in

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Andrzej Bialecki
Daniel Herlitz wrote: I would suggest you simply do not create unusable indexes. :-) Handle catch/throw/finally correctly and it should not present any problems. In some use scenarios it's not that simple... Anyway, back to the original question: indexExists() just checks for the presence of the

numDocs method of IndexReader

2005-04-19 Thread Tomcat Programmer
Hello Everyone, I need to be able to iterate through the entire set of documents within the index to perform some auditing. I originally tried the following code snip: int ndoc = idxReader.numDocs(); for (int i=0; i< ndoc; i++) { Document doc = idxReader.document(i); . . . } T

RE: Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri
Hi, The 2 products I mentioned are 2 rows. I get the products in bulk by using a limit clause. I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine. I am baffled that 1.2 or 1.5 million records are being indexed in 20 minutes compared to the 2 records I am indexin

Re: Passing XML objects to the analyzer ?

2005-04-19 Thread Erik Hatcher
On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote: Hi, I am working on an index to search XML data in a fixed format that I master well... The idea is that the XML content (which I have as JDOM object) actually carries the semantic which would be best converted directly into tokens by something

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Daniel Herlitz
Hm well, I unconsciously extrapolated natural language into java syntax. Just in case, should be: try: Build the index in a separate catalogue. if all ok: remove ('rm') production index and move ('mv') newly built index to its place. Notify using app that it should reopen its IndexReader. /D I

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Daniel Herlitz
I would suggest you simply do not create unusable indexes. :-) Handle catch/throw/finally correctly and it should not present any problems. Assume one app builds the index, another uses it: try: Build the index in a separate catalogue. finally: remove ('rm') production index and move ('mv') newl

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Luke Shannon
The only time I have seen corrupted indexes is when the java process is killed during the indexing process. If you shutdown tomcat (or what ever you are running for java) during the indexing process you will end up with a corrupted index. - Original Message - From: "Andy Roberts" <[EMAIL

Best way to purposely corrupt an index?

2005-04-19 Thread Andy Roberts
Hi, Seems like an odd request I'm sure. However, my application relies an index, and should the index become unusable for some unfortunate reason, I'd like my app to gracefully cope with this situation. Firstly, I need to know how to detect a broken index. Opening an IndexReader can potentiall

Passing XML objects to the analyzer ?

2005-04-19 Thread Paul Libbrecht
Hi, I am working on an index to search XML data in a fixed format that I master well... The idea is that the XML content (which I have as JDOM object) actually carries the semantic which would be best converted directly into tokens by something like an analyzer. However, adding fields is done no

WildCard search replacement

2005-04-19 Thread Aalap Parikh
Hi Volodymyr, About the trick you described about wildcard search replacement, you mentioned: > So I found following workaround. I index this field as > sequence of terms, each of containing single digit from > needed value. (For example I have “123214213” value > that needs to be indexed. Then i

Re: Lucene bulk indexing

2005-04-19 Thread Daniel Herlitz
Agree. We run an index with about 2.5 million documents and around 30 fields. The indexing itself of 2 items should only take a few seconds on a reasonably fast machine. /D Kevin L. Cobb wrote: I think your bottleneck is most likely the DB hit. I assume by 2 products you mean 2 disti

RE: Lucene bulk indexing

2005-04-19 Thread Kevin L. Cobb
I think your bottleneck is most likely the DB hit. I assume by 2 products you mean 2 distinct entries into the Lucene Index, i.e. 2 rows in the DB to select from. I index about 1.5 million rows from a SQL Server 2000 database with several fields for each entry and it finishes in about

Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri
Hi, I am sure this question must be raised before and maybe it has been even answered. I would be grateful, if someone could point me in the right direction or give their thoughts on this topic. The problem: I have approximately over 2 products that I need to index. At the moment I get X num

FW: Indexing aborts in mid-process

2005-04-19 Thread Jayakumar.V
Hi, I need some clarification on the indexing process. A process is initiated for indexing 1000 documents. If for some reason, the process fails mid-way during the indexing activity, say while indexing the 501st document, what is the status of the index files? Does it commit after each docu

Re: RTF text extractor ?

2005-04-19 Thread PA
On Apr 19, 2005, at 13:37, Eric Chow wrote: Is there any RTF text extractor for Lucene ? import javax.swing.text.Document; import javax.swing.text.rtf.RTFEditorKit; RTFEditorKitaKit = new RTFEditorKit(); DocumentaDocument = aKit.createDefaultDocument(); aKit.read( anInputStream, aDocume

Re: RTF text extractor ?

2005-04-19 Thread Erik Hatcher
On Apr 19, 2005, at 7:37 AM, Eric Chow wrote: Hello, Is there any RTF text extractor for Lucene ? You can use some Swing classes to do this. This is from the Lucene in Action code (http://www.lucenebook.com/search?query=rtf) public Document getDocument(InputStream is) throws DocumentHandle

RTF text extractor ?

2005-04-19 Thread Eric Chow
Hello, Is there any RTF text extractor for Lucene ? Eric - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Wildcard searching with Highlight support ?

2005-04-19 Thread Pasha Bizhan
Hi, > From: Eric Chow [mailto:[EMAIL PROTECTED] > > I mean if I use wildcard query, it cannot highlight any terms ? > > Any idea to do this or any existing example ? Try to rewrite query before highlighting. Pasha Bizhan

Re: Wildcard searching with Highlight support ?

2005-04-19 Thread mark harwood
Use query.rewrite() to expand the query before calling the highlighter. See the Junit test or javadocs for the QueryTermExtractor class. --- Eric Chow <[EMAIL PROTECTED]> wrote: > Hello, > > I downloaded the term highlighting in sandbox. > But it seems not support wildcard searching. > > I mean

Wildcard searching with Highlight support ?

2005-04-19 Thread Eric Chow
Hello, I downloaded the term highlighting in sandbox. But it seems not support wildcard searching. I mean if I use wildcard query, it cannot highlight any terms ? Any idea to do this or any existing example ? Best regards, Eric --