Thanks to all who answered with their experience and insights!
LUCENE-625 is very interesting, but not sure about the scalability.
"Begin completion only with 3 letters or more" is reasonable for
special cases, but not ideal. What I wanted to implement is a pretty
general software.
WildcardTermE
: We would also entertain alternative indexing approaches. We even
: considered concatenating all the text of the contained docs into a doc
: indexed as the zipfile, but lucene only indexes part of a large file and
: even if that were resolved, proximity searches can return false
: positives.
pr
First, when asking a new question, it's best to start a new subject.
Your question has nothing to do with the rest of the thread
That said, you want to create a Reader to pass along. I'd think about
doing this by subclassing your MSWord class from the Reader class
and providing the necessary
Jim,
There are a few things you can do to make extracting text easier on
yourself. There are several libraries that can assist you, POI and
TextMining.org both have excellent text extractors for Word.
As Mathieu suggests, you need to take a look at Document. Essentially,
you do everything you're
This isn't a "How do I index a zip file?" question. It's a bit more
complicated than that.
We have an index where zip files are broken apart and the contained
files are indexed. The index also contains a doc for the zip file
itself. The user has the option of (A) querying for the contained file
I looked at nutches code but it is too complicated for me to follow.
I do not understand the guts of Lucene and how analyzers, parsers, readers,
etc all fit together. I suppose I will be forced to learn it all someday but
at the moment I am adhering to KISS, Keep It Simple Stupid.
thanks for
many thanks I will try that, thanks again!
jim s
- Original Message -
From: "Donna L Gresh" <[EMAIL PROTECTED]>
To:
Sent: Friday, June 08, 2007 12:52 PM
Subject: Re: Indexing MSword Documents
I do this exact thing. "text" (the second input to the Field constructor)
is MSWord text
I do this exact thing. "text" (the second input to the Field constructor)
is MSWord text that I've extracted from the Word document
textField = new org.apache.lucene.document.Field(textFieldName,text,
org.apache.lucene.document.Field.Store.NO,
org.apache.lucene.document.Field.Index.TOKENIZED);
Why don't use Document?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/
org/apache/lucene/document/Document.html
HTMLDocument manage HTML stuff like encoding, header, and other
specificity.
Nutch use specific word tools (http://lucene.apache.org/nutch/apidocs/
org/ap
Hi,
I am trying to index msword documents. I've got things working but I do not
think I am doing things properly.
To index msword docs I use an extractor to extract the text. Then I write
the text to a .txt file and index that using an HTMLDocument object. Seems
to me that since I have the te
I am trying to index msword documents. I’ve got things working but I do not
think I am doing things properly.
To index msword docs I use an extractor to extract the text. Then I write
the text to a .txt file and index that using an HTLMDocument object. Seems
to me that since I have the text
Hello,
I have implemented with success a keyword-based search feature with MyFaces
/ Tomahawk. Tomahawk has an Ajax-based component:
JSF page:
<%-- wait dialog box --%>
Backing Bean:
/**
* Suggested keywords for Ajax lis
If you do that, you enumerate every terms!!!
If you use a alphabeticaly sorted collection, you can stop, when
match stop, but, you have to test every terms before matching.
Lucene gives you tools to match begining of a term, just use it!!
M.
Le 8 juin 07 à 14:57, Patrick Turcotte a écrit :
H
OK, I actually added a page. Now if anyone would like to make it
pretty, please feel free. I assume that the first few entries will be
heavily edited
to establish a "look and feel" so the ensuing pages can use them as a
model.
Best
Erick
On 6/7/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
You can get the information pretty quickly by using a
WildcardTermEnum (NOT query). Especially if you
terminate after some number of characters
Erick
On 6/7/07, Chris Lu <[EMAIL PROTECTED]> wrote:
Hi,
I would like to implement an AJAX search. Basically when user types in
several character
Hi,
What we did was this:
1) When you application starts, it scans the index for terms values and
store them in a map or something.
2) When you receive "ajax requests", you compare with the map data and
return the relevant part. Works quite fast for us, without round trips
to Lucene.
Patrick
C
Hi Hilton,
Hilton Campbell wrote:
> Yes, that's actually come up. The document ids are indeed changing which is
> causing problems. I'm still trying to work it out myself, but any help
> would most definitely be appreciated.
>
> Thanks,
> Hilton Campbell
>
> -Original Message-
> From:
Hilton Campbell wrote:
Yes, that's actually come up. The document ids are indeed changing which is
causing problems. I'm still trying to work it out myself, but any help
would most definitely be appreciated.
If you have an application Id per document, then you could cache that field for
each
have a look of opensearch.org specification, your self-completion
will work with IE7 and Firefox 2.
JSON serialization is quicker than XML stuff.
Be careful to limit the number of responses.
A search in "test*" works very well in my project with ten thousands
of documents.
Begin completion onl
8 jun 2007 kl. 03.31 skrev Chris Lu:
Hi,
I would like to implement an AJAX search. Basically when user types in
several characters, I will try to search the Lucene index and found
all possible matching items.
Seems I need to use wildcard query like "test*" to matching anything.
Is this the o
20 matches
Mail list logo