Facet searching on single field with multiple words value

2007-06-20 Thread Sawan Sharma
Hi friends, I tried to implement the facet searching in a sample code and when I tried it with various case and found no result in one case.I wanted to narrow by one field "title" and gave the multiple word or say phrase. So First, in this preparing the lucene query and converting it into QueryF

Re: ways to minimize index size?

2007-06-20 Thread Steve Liles
Compression aside you could index the "contents" as terms in separate fields instead of tokenized text, and disable storing of norms: String outgoingNumber="9198408365809"; String incomingNumber="9840861114"; _doc.add(new Field("outgoingNumber", outgoingNumber, Store.NO, Index.NO_NORMS)); _doc

query parser behavior with operator AND

2007-06-20 Thread Antony Sequeira
Hi, Following is an output from my test code, where the left hand side is the query and right hand side is the parsed query print. There are two sets , one with the default operator set to OR , the other with default set to AND The output looks funny for the for the first example when the opera

Re: Highlighter that works with phrase and span queries

2007-06-20 Thread Mark Miller
I will work up some performance numbers over the next day or two to share with you. I have spent the last day or two with a profiler trying to find the biggest performance drains. Unfortunately, I will probably not be able to squeeze out much more performance than the current Highlighter. When

Re: Highlighter that works with phrase and span queries

2007-06-20 Thread Yonik Seeley
On 6/20/07, Chris Lu <[EMAIL PROTECTED]> wrote: Agree. But I think another reason why highlighting is slow could also be the need to retrieve the document's content, quite likely it's on the hard drive, which usually takes around 10ms for each small document, more for larger document. I'm not

Re: Highlighter that works with phrase and span queries

2007-06-20 Thread Chris Lu
Agree. But I think another reason why highlighting is slow could also be the need to retrieve the document's content, quite likely it's on the hard drive, which usually takes around 10ms for each small document, more for larger document. -- Chris Lu - Instant Scalable Ful

Re: Position of matches to affect scoring

2007-06-20 Thread Otis Gospodnetic
Hi, Have you looked at using the HitCollector? Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Jesse Prabawa <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, June 2

Re: Highlighter that works with phrase and span queries

2007-06-20 Thread Otis Gospodnetic
Hi Mark, I know one large user (meaning: high query/highlight rates) of the current Highlighter and this user wasn't too happy with its performance. I don't know the details, other than it was inefficient. So now I'm wondering if you've benchmarked your Highlighter against that/current Highli

Re: Highlighter that works with phrase and span queries

2007-06-20 Thread Mike Klaas
On 19-Jun-07, at 3:39 PM, Mark Miller wrote: I have been working on extending the Highlighter with a new Scorer that correctly scores phrase and span queries. The highlighter is working great for me, but could really use some more banging on. If you have a need or an interest in a more accu

Re: The localized Languages.

2007-06-20 Thread Doron Cohen
Hi Kevin, are you looking for the sources under contrib\analyzers? Javadocs have both "core" and "contrib" together, but they are separated in the source tree, (and separated jars are created for them in the binary dist). Doron sejourne kevin <[EMAIL PROTECTED]> wrote on 20/06/2007 15:31:20: >

Re: The localized Languages.

2007-06-20 Thread Grant Ingersoll
Is contrib/analyzers what you are looking for? On Jun 20, 2007, at 6:31 PM, sejourne kevin wrote: Hi, It seem that all localized languages Analyser are absent from org.apache.lucene.analysis.* in the lastest 2.2 source release of Lucene. Is this normal or not ? regards, Kévin.

The localized Languages.

2007-06-20 Thread sejourne kevin
Hi, It seem that all localized languages Analyser are absent from org.apache.lucene.analysis.* in the lastest 2.2 source release of Lucene. Is this normal or not ? regards, Kévin. _ Ne gardez plus qu'une se

Re: ways to minimize index size?

2007-06-20 Thread Sebastin
Hi Erick do u have any idea on this? jm-27 wrote: > > Hi, > > I want to make my index as small as possible. I noticed about > field.setOmitNorms(true), I read in the list the diff is 1 byte per > field per doc, not huge but hey...is the only effect the score being > different? I hardly mind abo

Re: zero termfreq for some search strings with special characters

2007-06-20 Thread Erick Erickson
You don't. You don't have an actual term "emp-id" in your index. You have "emp" and "id". So "emp-id" isn't a term. If you really want to control this sort of thing, and none of the stock analyzers work exactly as you require, you need to write your own Analyzer that breaks the stream however you

Re: FW: Lucene indexing vs RDBMS insertion.

2007-06-20 Thread Erick Erickson
That's a tough one. What I still don't get is why your 1,000 records/sec is important. If you're really inserting records that fast, for very long, you must have an impressive piece of hardware ... That said, it might be possible to do something like have your base index out on disk somewhere, an

Re: zero termfreq for some search strings with special characters

2007-06-20 Thread SK R
Hi, Thanks for your reply. But how do I get termfreq of that term("emp-id")? Does Lucene have any other way to handle this? I appreciate any solution regarding this problem. Regards SenthilKumaran On 6/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right! "emp-id" w

RE: FW: Lucene indexing vs RDBMS insertion.

2007-06-20 Thread Chew Yee Chuang
Greetings Erick, my index need to have latest data (almost real time, but a delay of less than 1 minutes is acceptable). Thus there is no way to schedule the indexing. What I can do is to find a solution to minimize delay so system can get "almost" real time data to display. Thanks. --- eChuan

RE: zero termfreq for some search strings with special characters

2007-06-20 Thread Liu_Andy2
You are right! "emp-id" will be separated to two terms CONTENT:"emp" CONTENT:"id" by standard tokenizer for indexing and searching. But direct writing term (CONTENT:"emp-id") will not. Andy -Original Message- From: SK R [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 20, 2007 5:24 PM

zero termfreq for some search strings with special characters

2007-06-20 Thread SK R
Hi, I'm using standard tokenizer for both indexing and searching process.Myindexed value is like "emp-id Aq234 kaith creating document for search". I can get search results for the query CONTENT:"emp-id" by using hits = indexSearcher.search(*query*). But if I try to get termfrequency of t

RE: Understanding QueryParserTokenManager and QueryParser classes

2007-06-20 Thread Liu_Andy2
These two classes are generated by QueryParser.jj. Perhaps you should first look at how to use JavaCC and modify QueryParser.jj to meet your requirement. Andy -Original Message- From: Mahdi Rahimi [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 20, 2007 4:14 PM To: java-user@lucene.apach

Understanding QueryParserTokenManager and QueryParser classes

2007-06-20 Thread Mahdi Rahimi
Hello. I want to add an operator like (*) to query syntax. Then, I need to know and understand QueryParserTokenManager and QueryParser classes well. But I don't know, how can I find documents about these classes and algorithms of identifying tokens and other things in query string . Thanks for y

Re: Position of matches to affect scoring

2007-06-20 Thread Jesse Prabawa
Oh I think I have found some clues at: [1] http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967 [2] http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingSimilarity Thanks! Jes On 6/20/07, Jesse Prabawa <[EMAIL

Re: Position of matches to affect scoring

2007-06-20 Thread Jesse Prabawa
Hi Steve, Thanks for the advice and your detailed explanation. I have another question though, I understand that Lucene normalizes the scores based on field length. Is there a way for me to avoid this? Or perhaps have a better control of how the scores are normalized. Best regards, Jes On 6/19