I have a question about phrase query with stop words

2007-04-12 Thread Bill Taylor
t I am not sure how to do that. If necessary, I will paste in samples of my code for creating the indexes and doing the search. Thanks. Bill Taylor

Re: TextMining.org Word extractor

2007-03-01 Thread Bill Taylor
On Feb 23, 2007, at 2:00 PM, [EMAIL PROTECTED] wrote: Re: TextMining.org Word extractor Someone noted that textmining.org gets hacked. There is test- mining.org which appears to be a commercial site. Can someone tell me where to get the download of the original GPL textmining.org so

Is the new version of the Lucene book available in any form?

2007-01-26 Thread Bill Taylor
to the C version of Lucene. Has anyone build a multi-million document index with the C version? Where should I go to start learning about it? Thanks. Bill Taylor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

How many documents in the biggest Lucene index to date?

2007-01-25 Thread Bill Taylor
much. Bill Taylor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Help with Custom Analyzer

2006-10-16 Thread Bill Taylor
It is not THAT hard to write a custom analyzer, that is what I did. I found that there is a bug in the setup, however, in that there are two incompatible definitions of Token. The generated file Tokenizer.java refers to the wrong definition of Token so I ahve to patch it before it will compil

Re: Searching pdf, getting page number

2006-10-16 Thread Bill Taylor
On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: Hi, I know that I can index pdf-files (using a third-party library). Could you please tell me where to find this library? Is it possible to search the index for a phrase, getting not only the document, but also the page number in the (pd

Re: Looking for a stemmer that can return all inflected forms

2006-10-14 Thread Bill Taylor
be displayed in alphabetical order, use a TreeMap instead of a HashMap. Any help or pointer would be greatly appreciated. I would appreciate your telling me which stemmer for English words is easiest to incorporate into Lucene and where to find it. Thanks. Bill Taylor -

Re: a design question

2006-10-12 Thread Bill Taylor
IN THEORY, EJB containers are better able than Tomcat to spread incoming requests over a multitude of servers. There was considerable discussion some time ago about index search speed on a single processor. I do not remember the details, but there was some information about how fast a search

Re: Document on Indexing in Lucene

2006-10-12 Thread Bill Taylor
When I went there, I got a message that there were no shared folders in the brief case. It never gave me an opportunity to enter the password. Thanks. Bill Taylor On Oct 12, 2006, at 6:34 AM, sachin wrote: Hello, I have got lot of personal emails for sharing the "Lucene Investig

A question about query syntax, has it changed?

2006-10-02 Thread Bill Taylor
I am indexing individual pages of books. I get no results from the query accurate AND book:"first title" Each lucene document which represents one page of one book gets a field "book" which is indexed, stored, and not tokenized to store the title of the book. The word "accurate" appears on

Re: spell checker with lucene

2006-09-26 Thread Bill Taylor
ave looked at it. Are you thinking of doing a spell check on the queries people type? It might be better simply to check each word and see if it is found in the index. That will be a lot less work than adapting the spell checker to Lucene. B

Does anyone know of software for handling English plurals, "ing," etc?

2006-09-24 Thread Bill Taylor
t POSSIBLY be the first person to have wanted to do this. Does anyone know of software for detecting such combinations in English? Rumor hath that Google does this sort of thing without telling you; that;'s one way they can find m

Re: Lucene Suggest ?

2006-09-15 Thread Bill Taylor
Depending on the size of your index, you might want to put it in the downloaded page. I have a small index of maybe 1,500 words so I have the word list in the page. this is simpler than ajax, but will not work for big indexes, of course. On Sep 15, 2006, at 8:02 AM, Mark Müller wrote: Hi a

Re: Storing no. of occurances of a token

2006-09-13 Thread Bill Taylor
On Sep 13, 2006, at 3:39 AM, Paul Elschot wrote: On Wednesday 13 September 2006 09:30, Venkateshprasanna wrote: Is it possible for me to store the number of occurances of a token in a particular document or a collection of documents? When the token is indexed as a term, an IndexReader pro

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
e you know how to implement a new one, just do it. If you just want to modify StandardTokenizer, you can get the codes and rename it to your class, then modify something that you dislike. I think it's a so simple stuff, why do you make it so complicated? On 8/29/06, Bill Taylor <[EMAIL PROT

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
On Aug 29, 2006, at 7:12 PM, Mark Miller wrote: 2. The ParseException that is generated when making the StandardAnalyzer must be killed because there is another ParseException class (maybe in queryparser?) that must be used instead. The lucene build file excludes the StandardAnalyzer Parse

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
ht work for you without as much work... Best [EMAIL PROTECTED]'mNowBeyondMyCompetence.WhyDoTheyStillEmployMeHere? On 8/29/06, Bill Taylor <[EMAIL PROTECTED]> wrote: On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote: > > : Have a look at PerFieldAnalyzerWrapper: > > : &g

Re: Sort by Date

2006-08-29 Thread Bill Taylor
i gave each of my documents a special field named date and I put in a normalized Lucene date with a precision of one day. This date is mmdd so that it can be sorted. having done that, however, I am unsure how to ask Lucene to sort on that date, but I'll figure it out in time or someone wi

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote: : Have a look at PerFieldAnalyzerWrapper: : http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ PerFieldAnalyzerWrapper.html ...which can be specified in the constructors for IndexWriter and QueryParser. As I understand

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
er can't find. I suspect I have to use the same Analyzer on both, right? On 8/29/06, Bill Taylor <[EMAIL PROTECTED]> wrote: I am indexing documents which are filled with government jargon. As one would expect, the standard tokenizer has problems with governmenteese. In particular,

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
interested in */ } } Krovi. -Original Message- From: Bill Taylor [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 29, 2006 8:10 PM To: java-user@lucene.apache.org Subject: Installing a custom tokenizer I am indexing documents which are filled with government jargon. As one would expect

Installing a custom tokenizer

2006-08-29 Thread Bill Taylor
tring I want to index, then does doc.add(new Field(DocFormatters.CONTENT_FIELD, , Field.Store.YES, Field.index.TOKENIZED)); I suspect that my issue is getting the Field constructor to use a different tokenizer. Can anyone help? Thank

Re: Highlighter

2006-08-16 Thread Bill Taylor
[EMAIL PROTECTED] told me that the highlighter ALWAYS does this under certain conditions. In my case, it is when the string ends with . He knew why but I did not. I just fixed it in my code by putting things back. On Aug 16, 2006, at 3:17 AM, Ramesh Salla wrote: which version of Lucene a

is there a simple way to make a list of all words in an index?

2006-08-04 Thread Bill Taylor
already done something similar. thank you. Bill Taylor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]