Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Well Philip...bad news. I should have thought of this before...I think the query parser is the problem. You are tokening "all in the quotes" to one token...but when QueryParser sees that, it doesnt matter what analyzer you use, it's going to see the quotes and strip them right off . Then it pas

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
I am out of ideas. If I'm feeling perky I'll build you one in the morning. No, I've never used Luke. Is there an easy way to examine my RAMDirectory index? I can create the index with no quoted keywords, and when I search for a keyword, I get back the expected results (just can't search for a p

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
No, I've never used Luke. Is there an easy way to examine my RAMDirectory index? I can create the index with no quoted keywords, and when I search for a keyword, I get back the expected results (just can't search for a phrase that has whitespace in it). If I create the index with phrases in quo

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Erick Erickson
OK, I've gotta ask. Have you examined your index with Luke to see if what you *think* is in the index actually *is*??? Erick On 9/1/06, Philip Brown <[EMAIL PROTECTED]> wrote: Interesting...just ran a test where I put double quotes around everything (including single keywords) of source text

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Interesting...just ran a test where I put double quotes around everything (including single keywords) of source text and then ran searches for a known keyword with and without double quotes -- doesn't find either time. Mark Miller-5 wrote: > > Sorry to hear you're having trouble. You indeed nee

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Added the to the other section and reran the javacc and imported the new files...but, I still get the same result -- no results. (Quotes are in the source text and query string.) Anything else I might be missing? Philip Mark Miller-5 wrote: > > Sorry to hear you're having trouble. You indee

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Sorry to hear you're having trouble. You indeed need the double quotes in the source text. You will also need them in the query string. Make sure they are in both places. My machine is hosed right now or I would do it for you real quick. My guess is that I forgot to mention...no only do you need t

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Well, I tried that, and it doesn't seem to work still. I would be happy to zip up the new files, so you can see what I'm using -- maybe you can get it to work. The first time, I tried building the documents without quotes surrounding each phrase. Then, I retried by enclosing every phrase within

Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
On Friday 01 September 2006 19:46, Mark Miller wrote: > Eric also gave me the idea of using a SpanNear with maximum slop as a > boolean to connect spans. Using this and SpanOr seems to make my time spent > on the distribution of proximity clauses a little foolish :) Is that true? There is practice

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
That is a good point. I was just thinking that it would be a pain for searchers to have to include the quotes when searching, but I guess there is little way around it. The best you could do is have an option that specified a quoted search...and you might as well make that option be to put the

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Thanks, but I don't "think" I need that. But curious, how will it know it's a phrase if it's not enclosed in quotes? Won't all its terms be treated separately then? Philip Mark Miller-5 wrote: > > One more tip...if you would like to be able to search phrases without > putting in the quotes

Re: SV: GetMoreDocs question

2006-09-01 Thread Chris Hostetter
: But why do I have to reterive atleast 1 document when im using the TopDocs ? : (If I set nDoc to 0 it will throw an exception). i didn't say you had to, i just saaid "maybe" ... i don't know whatthe behavior is if you use 0 -- ideally it would work fine, but in practice i do't know if anyone ha

Re: Proximity Query Parser

2006-09-01 Thread Mark Miller
Eric also gave me the idea of using a SpanNear with maximum slop as a boolean to connect spans. Using this and SpanOr seems to make my time spent on the distribution of proximity clauses a little foolish :) Is that true? Is there any disadvantage to the max slop Spannear, SpanOr solution? Any adva

Re: Proximity Query Parser

2006-09-01 Thread Mark Miller
Thanks for the tip Paul. It is embarrassing, but I only realized how OrSpan queries worked a day or two ago based on a tip from Eric. The way I assumed it would create the spans before was just wrong and I never had researched further. Now I see that it would be a nice optimization for what I have

Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
On Friday 01 September 2006 12:54, Mark Miller wrote: > Hi Paul, > > I also have to treat things differently depending on if I am in a > proximity clause or boolean clause. A wildcard in a boolean is mapped to > a wildcard query. A wildcard in a proximity is mapped to a regex span > that has b

RE: retrieving LowestDoc

2006-09-01 Thread Ramana Jelda
Collect searched results in your own HitCollector and return results how ever you like.. :) Jelda > -Original Message- > From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] > Sent: Friday, September 01, 2006 5:13 PM > To: java-user@lucene.apache.org > Subject: retrieving LowestDoc > >

retrieving LowestDoc

2006-09-01 Thread Rupinder Singh Mazara
hi all the search implementation that i have requires not the top 1000 documents but the lowest 1000 documents to be returned I donot want to store the entire result set in memory and go to the last 1000 , is there any implementation / suggestions on how to achieve this thanks

Re: SpanRegex speed

2006-09-01 Thread Mark Miller
Erick Erickson wrote: OK, a not very helpful answer, but "of course they're slower, they do more work" (the span versions). But that's fairly useless, since the question is really "is it enough slower in my situation that I need to find an alternative?". And the only way I know of to answer tha

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
One more tip...if you would like to be able to search phrases without putting in the quotes you must strip them with the analyzer. In standardfilter (in the standard analyzer code) add this: private static final String QUOTED_TYPE = tokenImage[QUOTED]; - youll see where to put that and youll s

Re: SpanRegex speed

2006-09-01 Thread Erick Erickson
OK, a not very helpful answer, but "of course they're slower, they do more work" (the span versions). But that's fairly useless, since the question is really "is it enough slower in my situation that I need to find an alternative?". And the only way I know of to answer that question is to make som

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
So this will recognize anything in quotes as a single token and '_' and '-' will not break up words. There may be some repercussions for the NUM token but nothing I'd worry about. maybe you want to use Unicode for '-' and '_' as well...I wouldn't worry about it myself. - Mark TOKEN : {

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Philip Brown wrote: Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm not seeing StandardAnalyzer.jj in the Lucene source download. Mark Miller-5 wrote: Philip Brow

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Philip Brown
Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm not seeing StandardAnalyzer.jj in the Lucene source download. Mark Miller-5 wrote: > > Philip Brown wrote: >> Hi, >> >

RE: graphically representing an index

2006-09-01 Thread SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM
Hi Andzej, Thanks for the tip, it does what I want. You are right, though, it's of limited use for helping the user access data. But I'm sure it will come in handy for my own analysis. Best, Ariel -Message d'origine- De : Andrzej Bialecki [mailto:[EMAIL PROTECTED] Envoyé : jeudi 31 août

Re: SpanRegex speed

2006-09-01 Thread Mark Miller
Erick Erickson wrote: Let me chime in here on a different note before you get happy with wildcard queries, take a look at the thread "I just don't get wildcards at all". There is lots of good info that Erik, Chris and Otis provided me. The danger with prefixquery and wildcard query is that

Re: Phrase search using quotes -- special Tokenizer

2006-09-01 Thread Mark Miller
Philip Brown wrote: Hi, After running some tests using the StandardAnalyzer, and getting 0 results from the search, I believe I need a special Tokenizer/Analyzer. Does anybody have something that parses like the following: - doesn't parse apart phrases (in quotes) - doesn't parse/separate hyph

Re: read past EOF

2006-09-01 Thread Michael McCandless
Yes I am sure only one writer at a time accessing index. no i am not getting any other exception. and there is no problem of disk space also. right now i have backcopy of indexes so whenever one index got corrupted i m replacing with backup one and starting the indexer again from that durat

Re: Lock error attempting update of RAMDirectory index

2006-09-01 Thread Michael McCandless
You probably forgot to close an IndexWriter? Well, I wish it were that easy...I open one IndexWriter to write the documents to the index after it is created, and then call writer.optimize() and writer.close(). Your suggestion is a good one in that, from what I've read, the writer needs to be clo

Re: Proximity Query Parser

2006-09-01 Thread Mark Miller
Paul Elschot wrote: Mark, On Thursday 31 August 2006 23:18, Mark Miller wrote: I am not a huge fan of the queryparser's syntax so I have started an open source project to create a viable alternative. I could really use some helping testing it out. The more I can get it tested the better ch

Re: Lock error attempting update of RAMDirectory index

2006-09-01 Thread karl wettin
On Thu, 2006-08-31 at 19:34 -0700, Philip Brown wrote: karl wettin-3 wrote: > > > > On Thu, 2006-08-31 at 15:24 -0700, Philip Brown wrote: > >> > >> I'm getting the following error trying to instantiate an IndexModifier > >> on a RAMDirectory index: > >> > >> java.io.IOException: Lock obtain tim

Document and JMSObjects

2006-09-01 Thread Kinnar Kumar Sen, Noida
Hi I am trying to build an application which uses JMS objects and Lucene. I am creating Lucene Documents and sending them through JMS objects to a queue( I am using IBM MQ Series ). There is a listener which listens to this queue and Indexes these documents. The problem I am facing is there i

SV: GetMoreDocs question

2006-09-01 Thread Marcus Falck
Thx Hoss. But why do I have to reterive atleast 1 document when im using the TopDocs ? (If I set nDoc to 0 it will throw an exception). / Marcus -Ursprungligt meddelande- Från: Chris Hostetter [mailto:[EMAIL PROTECTED] Skickat: den 31 augusti 2006 20:09 Till: java-user@lucene.apache.org

Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
Mark, On Thursday 31 August 2006 23:18, Mark Miller wrote: > I am not a huge fan of the queryparser's syntax so I have started an > open source project to create a viable alternative. I could really use > some helping testing it out. The more I can get it tested the better > chance it has of se