t I am not sure how to do that.
If necessary, I will paste in samples of my code for creating the indexes
and doing the search.
Thanks.
Bill Taylor
On Feb 23, 2007, at 2:00 PM, [EMAIL PROTECTED]
wrote:
Re: TextMining.org Word extractor
Someone noted that textmining.org gets hacked. There is test-
mining.org which appears to be a commercial site. Can someone tell
me where to get the download of the original GPL textmining.org
so
to the C version of Lucene.
Has anyone build a multi-million document index with the C version?
Where should I go to start learning about it?
Thanks.
Bill Taylor
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
much.
Bill Taylor
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
It is not THAT hard to write a custom analyzer, that is what I did. I
found that there is a bug in the setup, however, in that there are two
incompatible definitions of Token. The generated file
Tokenizer.java refers to the wrong definition of Token so I ahve to
patch it before it will compil
On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote:
Hi,
I know that I can index pdf-files (using a third-party library).
Could you please tell me where to find this library?
Is it possible to search the index for a phrase, getting not only the
document, but also the page number in the (pd
be displayed in alphabetical order, use a
TreeMap instead of a HashMap.
Any help or pointer would be greatly appreciated.
I would appreciate your telling me which stemmer for English words is
easiest to incorporate into Lucene and where to find it. Thanks.
Bill Taylor
-
IN THEORY, EJB containers are better able than Tomcat to spread
incoming requests over a multitude of servers. There was considerable
discussion some time ago about index search speed on a single
processor. I do not remember the details, but there was some
information about how fast a search
When I went there, I got a message that there were no shared folders in
the brief case.
It never gave me an opportunity to enter the password.
Thanks.
Bill Taylor
On Oct 12, 2006, at 6:34 AM, sachin wrote:
Hello,
I have got lot of personal emails for sharing the "Lucene
Investig
I am indexing individual pages of books.
I get no results from the query
accurate AND book:"first title"
Each lucene document which represents one page of one book gets a field
"book" which is indexed, stored, and not tokenized to store the title
of the book.
The word "accurate" appears on
ave looked at it.
Are you thinking of doing a spell check on the queries people type? It
might be better simply to check each word and see if it is found in the
index. That will be a lot less work than adapting the spell checker to
Lucene.
B
t POSSIBLY be the first person to have wanted to do this. Does
anyone know of software for detecting such combinations in English?
Rumor hath that Google does this sort of thing without telling you;
that;'s one way they can find m
Depending on the size of your index, you might want to put it in the
downloaded page. I have a small index of maybe 1,500 words so I have
the word list in the page. this is simpler than ajax, but will not
work for big indexes, of course.
On Sep 15, 2006, at 8:02 AM, Mark Müller wrote:
Hi a
On Sep 13, 2006, at 3:39 AM, Paul Elschot wrote:
On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote:
Is it possible for me to store the number of occurances of a token in
a
particular document or a collection of documents?
When the token is indexed as a term, an IndexReader pro
e you know how to implement a new one, just do it.
If you just want to modify StandardTokenizer, you can get the codes and
rename it to your class, then modify something that you dislike. I
think
it's a so simple stuff, why do you make it so complicated?
On 8/29/06, Bill Taylor <[EMAIL PROT
On Aug 29, 2006, at 7:12 PM, Mark Miller wrote:
2. The ParseException that is generated when making the
StandardAnalyzer must be killed because there is another
ParseException class (maybe in queryparser?) that must be used
instead. The lucene build file excludes the StandardAnalyzer
Parse
ht work for you without
as
much work...
Best
[EMAIL PROTECTED]'mNowBeyondMyCompetence.WhyDoTheyStillEmployMeHere?
On 8/29/06, Bill Taylor <[EMAIL PROTECTED]> wrote:
On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote:
>
> : Have a look at PerFieldAnalyzerWrapper:
>
> :
&g
i gave each of my documents a special field named date and I put in a
normalized Lucene date with a precision of one day. This date is
mmdd so that it can be sorted. having done that, however, I am
unsure how to ask Lucene to sort on that date, but I'll figure it out
in time or someone wi
On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote:
: Have a look at PerFieldAnalyzerWrapper:
:
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/
PerFieldAnalyzerWrapper.html
...which can be specified in the constructors for IndexWriter and
QueryParser.
As I understand
er can't find. I suspect I have to use
the same Analyzer on both, right?
On 8/29/06, Bill Taylor <[EMAIL PROTECTED]> wrote:
I am indexing documents which are filled with government jargon. As
one would expect, the standard tokenizer has problems with
governmenteese.
In particular,
interested in */
}
}
Krovi.
-Original Message-
From: Bill Taylor [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 29, 2006 8:10 PM
To: java-user@lucene.apache.org
Subject: Installing a custom tokenizer
I am indexing documents which are filled with government jargon. As
one would expect
tring I want to index, then does
doc.add(new Field(DocFormatters.CONTENT_FIELD, ,
Field.Store.YES, Field.index.TOKENIZED));
I suspect that my issue is getting the Field constructor to use a
different tokenizer. Can anyone help?
Thank
[EMAIL PROTECTED] told me that the highlighter ALWAYS does this
under certain conditions. In my case, it is when the string ends with
. He knew why but I did not. I just fixed it in my code by
putting things back.
On Aug 16, 2006, at 3:17 AM, Ramesh Salla wrote:
which version of Lucene a
already done something similar.
thank you.
Bill Taylor
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
24 matches
Mail list logo