Re: Error Tolerant Query Parser

2007-04-04 Thread Mohsen Saboorian
Thanks, Absolute tolerance :) It seems that there is no other way. I found somethinig here: http://www.nabble.com/Error-tolerant-query-parsing-tf108987.html Otis Gospodnetic wrote: > > Hm, error tolerant query parser? How do you want to handle queries with > invalid syntax? > > Here is one w

Re: distinct term values?

2007-04-04 Thread Ryan McKinley
TermEnum works like a charm, no need to optimize (yet). Enjoy the Merlot! On 4/4/07, Erick Erickson <[EMAIL PROTECTED]> wrote: Sorry if this is a double post, but my last attempt failed.. Not that I know of, but I think you'll be surprised how fast TermEnum will walk the list of terms.

Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-04 Thread Otis Gospodnetic
Hi Doron, Yes, this was great help, thanks! I've got my: 1. MatchTask (just like ReadTask, but with searcher.match(Query, new MatchCollector() )) 2. SearchMatchTask (just like SearchTask, but extends MatchTask), so I was able to use "SearchMatch" in the alg file where "Search" was before.

Re: Range search in numeric fields

2007-04-04 Thread Peter W .
Andy, MemoryCachedRangeFilter looks nice, can't wait for it to be included with other goodies in the next 2.x point release! Numeric range search questions come up often for Lucene, best practices probably include working with BitSets directly (which I have been unable to grok), using queries li

Re: distinct term values?

2007-04-04 Thread Erick Erickson
Sorry if this is a double post, but my last attempt failed.. Not that I know of, but I think you'll be surprised how fast TermEnum will walk the list of terms. I think you misunderstand TermEnum. It will NOT enumerate a term twice, so there's no need for a hash, just a simple increment of a

Re: distinct term values?

2007-04-04 Thread Yonik Seeley
On 4/4/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: Is there an efficient way to know how many distinct terms there are for a given field name? I know I can walk through a TermEnum and put them into a hash No hash needed... just walk through the TermEnum and count. -Yonik ---

distinct term values?

2007-04-04 Thread Ryan McKinley
Is there an efficient way to know how many distinct terms there are for a given field name? I know I can walk through a TermEnum and put them into a hash, but it would be useful to know beforehand if you are going to get 4 distinct values or 40,000 I don't need to know what the terms are, just h

Re: Better parsing of Queries

2007-04-04 Thread Erick Erickson
About all you can do is roll your own. I suspect a decent regular expression would work, or you could let Lucene escape the query and then re-replace all \: with : Erick On 4/4/07, Simon Wistow <[EMAIL PROTECTED]> wrote: I'm looking for some advice on dealing with malformed queries. If a user

Better parsing of Queries

2007-04-04 Thread Simon Wistow
I'm looking for some advice on dealing with malformed queries. If a user searches for "yow!" then I get an exception from the query parser. I can get round this by using QueryParser.escape(query) first but then that prevents them from searching using other bits of the the query syntax such as "

Re: Design Problem: Searching large set of protected documents

2007-04-04 Thread Paul Elschot
On Wednesday 04 April 2007 01:32, Erick Erickson wrote: > I thought you could simply add a ConstantScoreQuery (whose > constructor takes a Filter) to a BooleanQuery. It seems that doing > this at the very top level with a MUST would do the trick. I have not tried this myself, but indeed this m

Re: How many Searches is a Searcher Worth?

2007-04-04 Thread Otis Gospodnetic
No reason that I can think of. What makes you think the problem is with the IndexSearcher? Maybe it's something else in your code, for instance. Make sure you have the same version of Java on both ends of the call. Also, Java 6 made our RMI calls a lot more stable than even 1.5. Otis . . . .

How many Searches is a Searcher Worth?

2007-04-04 Thread Craig W Conway
I am using an RMI architecture for calling a remote service which uses an IndexSearcher in its own JVM. I am starting the service with the following provisions for memory allocation and garbage collection: java -server -Xmx1024m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC After about 1000 search c

Re: Error Tolerant Query Parser

2007-04-04 Thread Otis Gospodnetic
Hm, error tolerant query parser? How do you want to handle queries with invalid syntax? Here is one way: try { QueryParser qp = new QueryParser(.); Query q = qp.parse(); } catch (Throwable t) { // tolerate any exception } ;) Bad but quite tolerant. Otis . . . . . . . . . . . .

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Thank you. But i found that the result is always 1. Even i input the token that I dont even have in the doc. What happened? Best, Sengly On 4/4/07, Laxmilal Menaria <[EMAIL PROTECTED]> wrote: hello, you can try this code : IndexReader ISer= IndexReader.open("C:/Testindex"); Ter

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Thanks so much for your explaination. But there is one thing that I want to make sure is that in case that i add the same token to the same field, internally is it redundancy? And in case, that I have many fields. What is the best way to list up the frequency of all the tokens from different fiel

Explanation from FunctionQuery

2007-04-04 Thread Annona Keene
I'm hoping someone can offer some insight into the FunctionQuery. I've just discovered this, and I think it's exactly what I've been looking for, but I'm having some trouble getting it to work. I can create and execute the query, but if I try to see the Explanation, I get an java.lang.Unsuppor

Re: IndexWriter Quandry

2007-04-04 Thread Michael McCandless
"Kvailis" <[EMAIL PROTECTED]> wrote: > I'm pretty new to Lucene (2.0.0) and and having an issue with the > IndexWriter: if I set the boolean argument to 'true' it goes ahead and > writes indexes that turn out to be perfectly usable; taking the same exact > code and swithing the boolean to 'false'

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Erick Erickson
See below On 4/4/07, Sengly Heng <[EMAIL PROTECTED]> wrote: Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text("Features", "blue");

Re: Unique City, State results from index based on zip

2007-04-04 Thread Erick Erickson
The default operator for QueryParser is OR, so what you may really be getting is hits on Mill, and Vally is irrelevant. But this is just a guess, it'd be way more helpful if you told us what your index structure was and what query you actually submitted, for which query.toString is really helpful

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Laxmilal Menaria
hello, you can try this code : IndexReader ISer= IndexReader.open("C:/Testindex"); TermEnum te=ISer.terms(new Term("Features","blue")); Term te1= te.term(); System.out.println("Frequency of blue "+ISer.docFreq(te1)); regards, -LM On 4/4/07, Sengly Heng <[EMAIL

Re: Field.lazy setter method?

2007-04-04 Thread jafarim
So, what's the usage of this propoerty in the Field class? On 4/4/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 4/4/07, jafarim <[EMAIL PROTECTED]> wrote: > Any way, is there any way to tell lucene that a field is to be lazy-loaded, > from the very beginning of field construction? No, that da

Re: Field.lazy setter method?

2007-04-04 Thread Grant Ingersoll
Lazy loading is handled through the FieldSelector interface on IndexReader.doc() and some variations. There is nothing special that need be done during indexing to mark a field as lazy. The isLazy method merely lets you know later, after loading a Document, if the field is, indeed, lazy.

Re: Field.lazy setter method?

2007-04-04 Thread Yonik Seeley
On 4/4/07, jafarim <[EMAIL PROTECTED]> wrote: Any way, is there any way to tell lucene that a field is to be lazy-loaded, from the very beginning of field construction? No, that data is not stored in the index. Lazy field loading is specified only when retrieving the stored fields of a document

Field.lazy setter method?

2007-04-04 Thread jafarim
Hi I wonder why there is not setter method for the "lazy" member variable in Field class. Does that mean the propoerty is nominal and setting it does not have any effect, or am I missing some point? Any way, is there any way to tell lucene that a field is to be lazy-loaded, from the very beginning

Re: Unique City, State results from index based on zip

2007-04-04 Thread Jokin Cuadrado
don't index the city names with the zip codes. indexed text - Stored Value --- 94941 - 94941 Mill Vallley 94114 - 94114 Mill Vallley Mill Vallley - Mill Vallley 29715 - 29715 Fort Mill 29708 - 29708 Fort Mill

Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text("Features", "blue"); doc.add(Field.Text("Features","beautiful"); doc.add(Field.Text(