Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Chris Lu
Thanks to all who answered with their experience and insights! LUCENE-625 is very interesting, but not sure about the scalability. "Begin completion only with 3 letters or more" is reasonable for special cases, but not ideal. What I wanted to implement is a pretty general software. WildcardTermE

Re: Question about querying for files in a zip file

2007-06-08 Thread Chris Hostetter
: We would also entertain alternative indexing approaches. We even : considered concatenating all the text of the contained docs into a doc : indexed as the zipfile, but lucene only indexes part of a large file and : even if that were resolved, proximity searches can return false : positives. pr

Re: IndexWriter.Optimize() is too slow and IOException! How Can I do?

2007-06-08 Thread Erick Erickson
First, when asking a new question, it's best to start a new subject. Your question has nothing to do with the rest of the thread That said, you want to create a Reader to pass along. I'd think about doing this by subclassing your MSWord class from the Reader class and providing the necessary

Re: Indexing MSword Documents

2007-06-08 Thread Wayne Graham
Jim, There are a few things you can do to make extracting text easier on yourself. There are several libraries that can assist you, POI and TextMining.org both have excellent text extractors for Word. As Mathieu suggests, you need to take a look at Document. Essentially, you do everything you're

Question about querying for files in a zip file

2007-06-08 Thread Eric Scott
This isn't a "How do I index a zip file?" question. It's a bit more complicated than that. We have an index where zip files are broken apart and the contained files are indexed. The index also contains a doc for the zip file itself. The user has the option of (A) querying for the contained file

Re: Indexing MSword Documents

2007-06-08 Thread jim shirreffs
I looked at nutches code but it is too complicated for me to follow. I do not understand the guts of Lucene and how analyzers, parsers, readers, etc all fit together. I suppose I will be forced to learn it all someday but at the moment I am adhering to KISS, Keep It Simple Stupid. thanks for

Re: Indexing MSword Documents

2007-06-08 Thread jim shirreffs
many thanks I will try that, thanks again! jim s - Original Message - From: "Donna L Gresh" <[EMAIL PROTECTED]> To: Sent: Friday, June 08, 2007 12:52 PM Subject: Re: Indexing MSword Documents I do this exact thing. "text" (the second input to the Field constructor) is MSWord text

Re: Indexing MSword Documents

2007-06-08 Thread Donna L Gresh
I do this exact thing. "text" (the second input to the Field constructor) is MSWord text that I've extracted from the Word document textField = new org.apache.lucene.document.Field(textFieldName,text, org.apache.lucene.document.Field.Store.NO, org.apache.lucene.document.Field.Index.TOKENIZED);

Re: Indexing MSword Documents

2007-06-08 Thread Mathieu Lecarme
Why don't use Document? http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ org/apache/lucene/document/Document.html HTMLDocument manage HTML stuff like encoding, header, and other specificity. Nutch use specific word tools (http://lucene.apache.org/nutch/apidocs/ org/ap

Indexing MSword Documents

2007-06-08 Thread jim shirreffs
Hi, I am trying to index msword documents. I've got things working but I do not think I am doing things properly. To index msword docs I use an extractor to extract the text. Then I write the text to a .txt file and index that using an HTMLDocument object. Seems to me that since I have the te

Re: IndexWriter.Optimize() is too slow and IOException! How Can I do?

2007-06-08 Thread jim shirreffs
I am trying to index msword documents. I’ve got things working but I do not think I am doing things properly. To index msword docs I use an extractor to extract the text. Then I write the text to a .txt file and index that using an HTLMDocument object. Seems to me that since I have the text

RE : How to implement AJAX search~Lucene Search part?

2007-06-08 Thread DZISIAK Jean-Paul
Hello, I have implemented with success a keyword-based search feature with MyFaces / Tomahawk. Tomahawk has an Ajax-based component: JSF page: <%-- wait dialog box --%> Backing Bean: /** * Suggested keywords for Ajax lis

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Mathieu Lecarme
If you do that, you enumerate every terms!!! If you use a alphabeticaly sorted collection, you can stop, when match stop, but, you have to test every terms before matching. Lucene gives you tools to match begining of a term, just use it!! M. Le 8 juin 07 à 14:57, Patrick Turcotte a écrit : H

Re: Documentation Promotion is in Motion!

2007-06-08 Thread Erick Erickson
OK, I actually added a page. Now if anyone would like to make it pretty, please feel free. I assume that the first few entries will be heavily edited to establish a "look and feel" so the ensuing pages can use them as a model. Best Erick On 6/7/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Erick Erickson
You can get the information pretty quickly by using a WildcardTermEnum (NOT query). Especially if you terminate after some number of characters Erick On 6/7/07, Chris Lu <[EMAIL PROTECTED]> wrote: Hi, I would like to implement an AJAX search. Basically when user types in several character

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Patrick Turcotte
Hi, What we did was this: 1) When you application starts, it scans the index for terms values and store them in a map or something. 2) When you receive "ajax requests", you compare with the map data and return the relevant part. Works quite fast for us, without round trips to Lucene. Patrick C

Re: How can I search over all documents NOT in a certain subset?

2007-06-08 Thread Steven Rowe
Hi Hilton, Hilton Campbell wrote: > Yes, that's actually come up. The document ids are indeed changing which is > causing problems. I'm still trying to work it out myself, but any help > would most definitely be appreciated. > > Thanks, > Hilton Campbell > > -Original Message- > From:

Re: How can I search over all documents NOT in a certain subset?

2007-06-08 Thread Antony Bowesman
Hilton Campbell wrote: Yes, that's actually come up. The document ids are indeed changing which is causing problems. I'm still trying to work it out myself, but any help would most definitely be appreciated. If you have an application Id per document, then you could cache that field for each

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread Mathieu Lecarme
have a look of opensearch.org specification, your self-completion will work with IE7 and Firefox 2. JSON serialization is quicker than XML stuff. Be careful to limit the number of responses. A search in "test*" works very well in my project with ten thousands of documents. Begin completion onl

Re: How to implement AJAX search~Lucene Search part?

2007-06-08 Thread karl wettin
8 jun 2007 kl. 03.31 skrev Chris Lu: Hi, I would like to implement an AJAX search. Basically when user types in several characters, I will try to search the Lucene index and found all possible matching items. Seems I need to use wildcard query like "test*" to matching anything. Is this the o