Phrase search using quotes -- special Tokenizer

2006-08-31 Thread Philip Brown
Hi, After running some tests using the StandardAnalyzer, and getting 0 results from the search, I believe I need a special Tokenizer/Analyzer. Does anybody have something that parses like the following: - doesn't parse apart phrases (in quotes) - doesn't parse/separate hyphentated or underscore

Re: Lock error attempting update of RAMDirectory index

2006-08-31 Thread Philip Brown
Well, I wish it were that easy...I open one IndexWriter to write the documents to the index after it is created, and then call writer.optimize() and writer.close(). Your suggestion is a good one in that, from what I've read, the writer needs to be closed to release the lock file. Apparently, the

Re: Lock error attempting update of RAMDirectory index

2006-08-31 Thread karl wettin
On Thu, 2006-08-31 at 15:24 -0700, Philip Brown wrote: > > I'm getting the following error trying to instantiate an IndexModifier > on a RAMDirectory index: > > java.io.IOException: Lock obtain timed out: > [EMAIL PROTECTED] You probably forgot to close an IndexWriter? ---

Re: word frequency list?

2006-08-31 Thread Jason Pump
Thanks Boris, Jason Boris Aleksandrovsky wrote: Jason, You can look here: http://www.cs.ualberta.ca/~lindek/downloads.htm for Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus ). The words are norma

highlight using a MemoryIndex

2006-08-31 Thread Daniel J. Williams
I was able to get the following code to work using a RAMDirectory, but after reading the description of the MemoryIndex, I wanted to try to use it instead for speed reasons. I removed the RAMDir code, and replaced the references with the MemoryIndex, and all seems to go well till I start steppin

Lock error attempting update of RAMDirectory index

2006-08-31 Thread Philip Brown
I'm getting the following error trying to instantiate an IndexModifier on a RAMDirectory index: java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED] at org.apache.lucene.store.Lock.obtain(Lock.java(Compiled Code)) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:2

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread karl wettin
On Thu, 2006-08-31 at 17:33 -0400, Mark Miller wrote: > > Bad news for me. Any hope of a speedier fuzzy span? Using a spell checker comes in mind. A speedier index is another way to go. RAMDirectory is n times faster than FSDirectory and issue 550-index is 5x faster than RAMDirectory if you onl

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread Mark Miller
karl wettin wrote: On Thu, 2006-08-31 at 17:17 -0400, Mark Miller wrote: I want to use it for my query parser so you can do a fuzzy search inside of a proximity search. Is it any slower than a standard fuzzy query? I find it to be extremly slow. All terms in the index need to be enume

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread karl wettin
On Thu, 2006-08-31 at 17:17 -0400, Mark Miller wrote: > > I want to use it for my query parser so you can do a fuzzy search > inside of a proximity search. Is it any slower than a standard fuzzy > query? I find it to be extremly slow. All terms in the index need to be enumerated (or a subset if

Re: word frequency list?

2006-08-31 Thread Boris Aleksandrovsky
Jason, You can look here: http://www.cs.ualberta.ca/~lindek/downloads.htm for Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus ). The words are normalized as follows: ALL CAP words are prepended with a_

Proximity Query Parser

2006-08-31 Thread Mark Miller
I am not a huge fan of the queryparser's syntax so I have started an open source project to create a viable alternative. I could really use some helping testing it out. The more I can get it tested the better chance it has of serving the community. The parser is called Qsol. I am right up again

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread Mark Miller
karl wettin wrote: On Thu, 2006-08-31 at 14:27 -0400, Mark Miller wrote: When is a query rewritten? I build my query and then before using it, I would like to print it out to double check it. Not possible? Does the rewrite happen inside search? Right, you can't do a toString prior to

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread karl wettin
On Thu, 2006-08-31 at 14:27 -0400, Mark Miller wrote: > When is a query rewritten? I build my query and then before using it, I > would like to print it out to double check it. Not possible? Does the > rewrite happen inside search? Right, you can't do a toString prior to rewriting it. The probl

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread Mark Miller
karl wettin wrote: On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote: Anyone know of a way to get a fuzzy query into a spanquery? http://issues.apache.org/jira/browse/LUCENE-522 - To unsubscribe, e-mail: [EMAIL PR

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread Mark Miller
karl wettin wrote: On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote: Anyone know of a way to get a fuzzy query into a spanquery? http://issues.apache.org/jira/browse/LUCENE-522 - To unsubscribe, e-mail: [EMAIL PR

Re: GetMoreDocs question

2006-08-31 Thread Chris Hostetter
: I have some questions regarding the GetMoreDocs(50) call in the : constructors of the Hits class. : First off whats the purposes of this call? Hits is designed to meet the simple needs of simple clients -- the assumption is that clients using Hits want simple paginated results - so Hits goes a

Re: Sorting based on a selling rate

2006-08-31 Thread John Pailet
Putting selling rate in the index is OK for me, I also think that is a good idea. The problem is: I don't know how to store the sell rate of the product that depends on a specific query Can you please give me your idea about how to store it in the Lucene document ? (field/value) Thank you very

Using Lucene Index for Business Intelligence / Analytics

2006-08-31 Thread Saurabh Dani
I don't have a lot of experience with reporting tools and how data is stored by high priced tools which use OLAP and other similar storage types but we needed a solution for drill down reports and searching within large application logs / web server logs. So, w

Re: graphically representing an index

2006-08-31 Thread Erick Erickson
Take a look at Luke (http://www.getopt.org/luke/). I think this does a lot of what you're asking for. It's opensource, so you could see how it's done. There are screenshots at the link above so you can see if it's actually what you want. You might also want to look at the Term* classes in the

Re: graphically representing an index

2006-08-31 Thread Andrzej Bialecki
SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM wrote: Hi all, I'm a newbie with Lucene and I'm looking to implement the following: I want to index posts from a forum, and, rather than proposing a search on the contents, graphically represent the contents of the index. More precisely, I would like to ha

Re: GetMoreDocs question

2006-08-31 Thread Erick Erickson
See below... On 8/31/06, Marcus Falck <[EMAIL PROTECTED]> wrote: Hi, I have some questions regarding the GetMoreDocs(50) call in the constructors of the Hits class. What constructors? I just get a Hits object returned from the Searcher. Or are you looking in the source? First off whats

graphically representing an index

2006-08-31 Thread SOMMERIA KLEIN Ariel Ext VIACCESS-BU_DRM
Hi all, I'm a newbie with Lucene and I'm looking to implement the following: I want to index posts from a forum, and, rather than proposing a search on the contents, graphically represent the contents of the index. More precisely, I would like to have a list of the most popular words, with a number

Re: SpanRegex speed

2006-08-31 Thread Erick Erickson
Let me chime in here on a different note before you get happy with wildcard queries, take a look at the thread "I just don't get wildcards at all". There is lots of good info that Erik, Chris and Otis provided me. The danger with prefixquery and wildcard query is that they will throw TooManyC

Re: Escaping escape char

2006-08-31 Thread Erik Hatcher
On Aug 31, 2006, at 5:43 AM, WATHELET Thomas wrote: Hi, QueryParser queryparser = new QueryParser(field, new SimpleAnalyzer()); +doccontent:"european parliament resolution on the commission report on the regional meetings arranged by the commission in on the common fisheries policy a

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread karl wettin
On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote: > Anyone know of a way to get a fuzzy query into a spanquery? http://issues.apache.org/jira/browse/LUCENE-522 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comma

Re: FuzzyQurey in SpanQuery

2006-08-31 Thread mark harwood
Something like this? Query expandedQuery=fuzzyQuery.rewrite(reader); HashSet termsSet=new HashSet(); expandedQuery.extractTerms(termsSet); ArrayList termsList=new ArrayList(); for (Iterator iter = termsSet.iterator(); iter.hasNext();) { Term term = (Term) iter.next(); SpanTermQuery st

FuzzyQurey in SpanQuery

2006-08-31 Thread Mark Miller
Anyone know of a way to get a fuzzy query into a spanquery? - Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Escaping escape char

2006-08-31 Thread WATHELET Thomas
Hi, I have an index with a field 'content' (tokenized, stored, indexed) using Lucene 1.9.1. I tried to search this text in exact string: "european parliament resolution on the Commission report on the regional meetings arranged by the Commission in 1998-1999 on the common fisheries policy after 2

RE: Scoring Technique based on Relevance Feeback & other Parameters

2006-08-31 Thread sachin
Hello, Very small and sweet Question? Does Apache allow me to change the Final classes which are distributed by Apache for Scorers? Or can I copy and paste some of the Lucene code into my commercial application within my organization? TermScorer, BooleanScorer are final classes. But all other sc

RE: Scoring Technique based on Relevance Feeback & other Parameters

2006-08-31 Thread sachin
Hello, Very small and sweet Question? Does Apache allow me to change the Final classes which are distributed by Apache for Scorers? Or can I copy and paste some of the Lucene code into my commercial application within my organization? TermScorer, BooleanScorer are final classes. But all other sc

GetMoreDocs question

2006-08-31 Thread Marcus Falck
Hi, I have some questions regarding the GetMoreDocs(50) call in the constructors of the Hits class. First off whats the purposes of this call? Why do I have to make this call if I only want to get out a count of the matching documents and don't want to reterive any document from the ind