Re: Google finance-like suggestible search field

2009-01-14 Thread Paul Libbrecht
(sorry to respond to myself) Le 15-janv.-09 à 08:13, Paul Libbrecht a écrit : We have a suggestion engine and we only auto-complete from 3 characters (or a number). http://draft.i2geo.net/SearchI2G/skills-text-box-editor.jsp?language=en What would be nice for your case and maybe for ours is

Re: Google finance-like suggestible search field

2009-01-14 Thread Paul Libbrecht
We have a suggestion engine and we only auto-complete from 3 characters (or a number). http://draft.i2geo.net/SearchI2G/skills-text-box-editor.jsp?language=en What would be nice for your case and maybe for ours is that this expansion done in PrefixQuery is made more explicit so that one cou

Re: Google finance-like suggestible search field

2009-01-14 Thread Erick Erickson
Sorry, hit the send too quickly. That last should read: "much more suitable than forming a query". Best Erick On Wed, Jan 14, 2009 at 9:57 PM, Erick Erickson wrote: > First, it's a legitimate question whether matching on single-letter > prefixes is useful for the user. If you're running into To

Re: Google finance-like suggestible search field

2009-01-14 Thread Erick Erickson
First, it's a legitimate question whether matching on single-letter prefixes is useful for the user. If you're running into TooManyClauses, that means (if you haven't changed the defaults) that there are more than 1024 possibilities. Which is far too many for the user to scan through. You could lo

RE: Google finance-like suggestible search field

2009-01-14 Thread Hayes, Peter
Yes Jack that is what we found. One approach we kicked around is using a standard TermQuery but breaking up each word into its prefixes. For example, the word 'IBM' would be added to a document broken into 'I', 'IB', 'IBM'. The downsides seem to be a lot of waste in the index. Any thoughts on

Re: Google finance-like suggestible search field

2009-01-14 Thread Jack Stahl
Eric, I don't think that will work. The PrefixQuery generates a giant BooleanQuery that ORs one TermQuery for each matching term in the index for that prefix. So the problem isn't the number of fields, but that PrefixQueries dont scale to large indices. Jack On Wed, Jan 14, 2009 at 6:18 PM, An

RE: Google finance-like suggestible search field

2009-01-14 Thread Angel, Eric
Peter, Why don't you put all your "autocompletable" values into a single document field and just query a single field? Google seems to only use two fields for autocomplete - symbol and company name. Eric -Original Message- From: Hayes, Peter [mailto:peter.ha...@fmr.com] Sent: Wednesday

Google finance-like suggestible search field

2009-01-14 Thread Hayes, Peter
Hi all, We are trying to implement a Google finance-like suggest as you type search field. The index is quite large and comprised of multiple fields to search across so our initial implementation was to use a BooleanQuery with multiple PrefixQuery across each field. We quickly ran into the TooMa

Testing Precision and Recall on Lucene

2009-01-14 Thread david muchangi
Dear All, I wish to have a quick test on how lucene performs in terms of precision and recall.Anyone with a small application that I can use quickly without having to program using the APIs? Thanks. David

Re: Using analyzer while constructing Lucene queries

2009-01-14 Thread Erick Erickson
I'm pretty sure that StandardAnalyzer does NOT stem, BTW. But to your main question. I'm confused by your user of the term "manually". When creating a query, you really have two choices: 1> let the query parser do your work for you. The result of the parse operation is a Query, which can be added

Re: Using analyzer while constructing Lucene queries

2009-01-14 Thread Jack Stahl
Hi Rajesh, TermQueries (and likewise other queries) take a Term object, which in turn takes a String. That String should be the analyzed version ("play") of your originaly query word ("playing"). To get that, you need to feed your analyzer a Reader of the string you wish to parse ("playing"). I

Re: Using analyzer while constructing Lucene queries

2009-01-14 Thread Rajesh parab
Thanks Ian. I agree with you on lowercasing of characters. My main concern is specific to stemming done by analyzers. For example, StandardAnalyzer will stem words like playing, played, plays, etc. to a common tokan "play" which will be stored in the index. Now, during searches, we would need

Re: ShingleMatrixFilter for synonyms

2009-01-14 Thread Karl Wettin
Hi Eric, ShingleMatrixFilter does not add some sort of multiple token synonym feature on top of a plain old Lucene index, it does however create permutations of tokens in a matrix. My suggestion is that you first look at what shingles are and make sure this is something you feel is intere