date:20050419

Re: numDocs method of IndexReader

2005-04-19 Thread Morus Walter

Otis Gospodnetic writes: > > The link to list archives should be on lucene.apache.org. > It should, but the link there does not work. All you get is 'Error occurred Required parameter "listId" or "listName" is missing or invalid' from mail-archives.apache.org Something seems to be broken. So t

Re: globally unique field

2005-04-19 Thread PA

On Apr 20, 2005, at 04:09, Wesley MacDonald wrote: UID consists of a unique number based on a hashcode, system time and a counter, and a VMID contains a UID and adds a SHA hash based on IP address. Hmmm... UUID? http://en.wikipedia.org/wiki/UUID http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUI

Re: numDocs method of IndexReader

2005-04-19 Thread Tomcat Programmer

Hi Otis, Thanks for your answer on the integer issue. I was not sure if the index was actually limited, or if it was just the numDocs method call. I guess it really does not matter which it is; and for me, I don't think my index will ever get that large! I do have a couple of questions from you

Re: Lucene bulk indexing

2005-04-19 Thread skoptelov

Ð ÑÐÐÐÑ ÐÑ ÐÑÐÐÐ 20 ÐÐÑÐÐÑ 2005 04:07 Mufaddal Khumri ÑÐÐ(a): > The 2 products I mentioned are 2 rows. I get the products in > bulk by using a limit clause. > > I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine. Maybe your session-level cache in hibernate grow

Re: numDocs method of IndexReader

2005-04-19 Thread Otis Gospodnetic

Hello, Yes, there is a limit, but it's pretty high: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Integer.html#MAX_VALUE Iterating the index like that is ok, but each call do reader.document(int) pulls the entire Document off the disk, which can get expensive. The link to list archives shoul

RE: globally unique field

2005-04-19 Thread Wesley MacDonald

Hi, I posted this in the past: Java has GUID's classes called java.rmi.server.UID and java.rmi.dgc.VMID. The UID class can generate identifiers that are unique over time within a JVM. The VMID class provides uniqueness across ALL JVM's. UID consists of a unique number based on a hashcode, syste

Re: globally unique field

2005-04-19 Thread Chuck Williams

Mike Baranczak wrote: First of all, a big thanks to all the Lucene hackers - I've only been using your product for a couple of weeks, and I've been very impressed by what I've seen. Here's my question: I have an index with a little over 3 million documents in it, with more on the way. Each docu

globally unique field

2005-04-19 Thread Mike Baranczak

First of all, a big thanks to all the Lucene hackers - I've only been using your product for a couple of weeks, and I've been very impressed by what I've seen. Here's my question: I have an index with a little over 3 million documents in it, with more on the way. Each document has an "URL" fiel

Re: Passing XML objects to the analyzer ?

2005-04-19 Thread Paul Libbrecht

Le 19 avr. 05, à 22:50, Erik Hatcher a écrit : The only catch that I know if is that an Analyzer is invoked on a per-field basis. I can't tell exactly what you have in mind, but a Lucene Analyzer cannot split data into separate fields itself - it has to have been split prior. That's an easy one

Re: Lucene bulk indexing

2005-04-19 Thread Chris Lamprecht

Muffadal, First, you should add some timing code to determine whether your database is slow, or your indexing (I think tokenization occurs in the call to writer.addDocument()). Assuming your database query is the slowdown, read on... Depending on the details of your database (which fields are in

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Andrzej Bialecki

Daniel Herlitz wrote: I would suggest you simply do not create unusable indexes. :-) Handle catch/throw/finally correctly and it should not present any problems. In some use scenarios it's not that simple... Anyway, back to the original question: indexExists() just checks for the presence of the

numDocs method of IndexReader

2005-04-19 Thread Tomcat Programmer

Hello Everyone, I need to be able to iterate through the entire set of documents within the index to perform some auditing. I originally tried the following code snip: int ndoc = idxReader.numDocs(); for (int i=0; i< ndoc; i++) { Document doc = idxReader.document(i); . . . } T

RE: Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri

Hi, The 2 products I mentioned are 2 rows. I get the products in bulk by using a limit clause. I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine. I am baffled that 1.2 or 1.5 million records are being indexed in 20 minutes compared to the 2 records I am indexin

Re: Passing XML objects to the analyzer ?

2005-04-19 Thread Erik Hatcher

On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote: Hi, I am working on an index to search XML data in a fixed format that I master well... The idea is that the XML content (which I have as JDOM object) actually carries the semantic which would be best converted directly into tokens by something

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Daniel Herlitz

Hm well, I unconsciously extrapolated natural language into java syntax. Just in case, should be: try: Build the index in a separate catalogue. if all ok: remove ('rm') production index and move ('mv') newly built index to its place. Notify using app that it should reopen its IndexReader. /D I

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Daniel Herlitz

I would suggest you simply do not create unusable indexes. :-) Handle catch/throw/finally correctly and it should not present any problems. Assume one app builds the index, another uses it: try: Build the index in a separate catalogue. finally: remove ('rm') production index and move ('mv') newl

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Luke Shannon

The only time I have seen corrupted indexes is when the java process is killed during the indexing process. If you shutdown tomcat (or what ever you are running for java) during the indexing process you will end up with a corrupted index. - Original Message - From: "Andy Roberts" <[EMAIL

Best way to purposely corrupt an index?

2005-04-19 Thread Andy Roberts

Hi, Seems like an odd request I'm sure. However, my application relies an index, and should the index become unusable for some unfortunate reason, I'd like my app to gracefully cope with this situation. Firstly, I need to know how to detect a broken index. Opening an IndexReader can potentiall

Passing XML objects to the analyzer ?

2005-04-19 Thread Paul Libbrecht

Hi, I am working on an index to search XML data in a fixed format that I master well... The idea is that the XML content (which I have as JDOM object) actually carries the semantic which would be best converted directly into tokens by something like an analyzer. However, adding fields is done no

WildCard search replacement

2005-04-19 Thread Aalap Parikh

Hi Volodymyr, About the trick you described about wildcard search replacement, you mentioned: > So I found following workaround. I index this field as > sequence of terms, each of containing single digit from > needed value. (For example I have 123214213 value > that needs to be indexed. Then i

Re: Lucene bulk indexing

2005-04-19 Thread Daniel Herlitz

Agree. We run an index with about 2.5 million documents and around 30 fields. The indexing itself of 2 items should only take a few seconds on a reasonably fast machine. /D Kevin L. Cobb wrote: I think your bottleneck is most likely the DB hit. I assume by 2 products you mean 2 disti

RE: Lucene bulk indexing

2005-04-19 Thread Kevin L. Cobb

I think your bottleneck is most likely the DB hit. I assume by 2 products you mean 2 distinct entries into the Lucene Index, i.e. 2 rows in the DB to select from. I index about 1.5 million rows from a SQL Server 2000 database with several fields for each entry and it finishes in about

Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri

Hi, I am sure this question must be raised before and maybe it has been even answered. I would be grateful, if someone could point me in the right direction or give their thoughts on this topic. The problem: I have approximately over 2 products that I need to index. At the moment I get X num

FW: Indexing aborts in mid-process

2005-04-19 Thread Jayakumar.V

Hi, I need some clarification on the indexing process. A process is initiated for indexing 1000 documents. If for some reason, the process fails mid-way during the indexing activity, say while indexing the 501st document, what is the status of the index files? Does it commit after each docu

Re: RTF text extractor ?

2005-04-19 Thread PA

On Apr 19, 2005, at 13:37, Eric Chow wrote: Is there any RTF text extractor for Lucene ? import javax.swing.text.Document; import javax.swing.text.rtf.RTFEditorKit; RTFEditorKitaKit = new RTFEditorKit(); DocumentaDocument = aKit.createDefaultDocument(); aKit.read( anInputStream, aDocume

Re: RTF text extractor ?

2005-04-19 Thread Erik Hatcher

On Apr 19, 2005, at 7:37 AM, Eric Chow wrote: Hello, Is there any RTF text extractor for Lucene ? You can use some Swing classes to do this. This is from the Lucene in Action code (http://www.lucenebook.com/search?query=rtf) public Document getDocument(InputStream is) throws DocumentHandle

RTF text extractor ?

2005-04-19 Thread Eric Chow

Hello, Is there any RTF text extractor for Lucene ? Eric - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Wildcard searching with Highlight support ?

2005-04-19 Thread Pasha Bizhan

Hi, > From: Eric Chow [mailto:[EMAIL PROTECTED] > > I mean if I use wildcard query, it cannot highlight any terms ? > > Any idea to do this or any existing example ? Try to rewrite query before highlighting. Pasha Bizhan

Re: Wildcard searching with Highlight support ?

2005-04-19 Thread mark harwood

Use query.rewrite() to expand the query before calling the highlighter. See the Junit test or javadocs for the QueryTermExtractor class. --- Eric Chow <[EMAIL PROTECTED]> wrote: > Hello, > > I downloaded the term highlighting in sandbox. > But it seems not support wildcard searching. > > I mean

Wildcard searching with Highlight support ?

2005-04-19 Thread Eric Chow

Hello, I downloaded the term highlighting in sandbox. But it seems not support wildcard searching. I mean if I use wildcard query, it cannot highlight any terms ? Any idea to do this or any existing example ? Best regards, Eric --

Re: numDocs method of IndexReader

Re: globally unique field

Re: numDocs method of IndexReader

Re: Lucene bulk indexing

Re: numDocs method of IndexReader

RE: globally unique field

Re: globally unique field

globally unique field

Re: Passing XML objects to the analyzer ?

Re: Lucene bulk indexing

Re: Best way to purposely corrupt an index?

numDocs method of IndexReader

RE: Lucene bulk indexing

Re: Passing XML objects to the analyzer ?

Re: Best way to purposely corrupt an index?

Re: Best way to purposely corrupt an index?

Re: Best way to purposely corrupt an index?

Best way to purposely corrupt an index?

Passing XML objects to the analyzer ?

WildCard search replacement

Re: Lucene bulk indexing

RE: Lucene bulk indexing

Lucene bulk indexing

FW: Indexing aborts in mid-process

Re: RTF text extractor ?

Re: RTF text extractor ?

RTF text extractor ?

RE: Wildcard searching with Highlight support ?

Re: Wildcard searching with Highlight support ?

Wildcard searching with Highlight support ?

30 matches

Site Navigation

Mail list logo

Footer information