RE: "People you might know" ( a la Facebook) - *slightly offtopic*

2009-03-17 Thread Max Metral
I'm not sure this would fall primarily under recommenders... I would assume Facebook is doing "look-ahead" on connections. i.e. A->B, B->C, so suggest A->C. Then they weight the suggestions by the number of indirect links between A and C and probably other factors (which is where the generic "

RE: double metaphone for misspellings

2008-12-18 Thread Max Metral
Somehow I seem to have missed (and can't find) your original mail, but it seems like you're asking about using double metaphone for place names. We've done this on our site (http://boston.povo.com) for street and place names, and I can't say we've been happy with the results. We're toying with ngr

Snowball Analyzer and apostrophes

2008-06-17 Thread Max Metral
So I'm using Snowball Analyzer on a field for business titles. The value "Charlie's Sandwich Shoppe" becomes "charli sandwich shopp". This happens partly because the StandardAnalyzer strips off the apostrophe-s entirely, and then the Snowballer takes off the e. The problem is when someone comes

RE: Word split problems

2008-04-18 Thread Max Metral
It's probably about 100,000 entries per "thing that it would care about at once". -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Thursday, April 17, 2008 3:17 PM To: java-user@lucene.apache.org Subject: Re: Word split problems Max Metral skrev:

Word split problems

2008-04-17 Thread Max Metral
In our app, we search for businesses. So here's an example: Lululemon Athletica I'd like any of these search terms to work for this: Lulu lemon Lu Lu Lemon Lululemon What strategy would be optimal for this kind of thing (of course keeping in mind negative matches are also bad)?

RE: Using lucene with a Geospatial catalog

2008-02-17 Thread Max Metral
We're doing this for our site (http://boston.povo.com) the simple way: have Lucene return all matches based on non-geo criteria and then fetch the items from the db by id and run our geo logic. We store some "rough" positioning in Lucene, such as the region and use that for first level rejectio

RE: Spell checking street names

2008-01-31 Thread Max Metral
gic, it's obviously not the best metric. Is there an appropriate edit distance metric that takes phonetics into account? -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Thursday, January 31, 2008 6:12 AM To: java-user@lucene.apache.org Subject: Re: Spell checking s

Spell checking street names

2008-01-30 Thread Max Metral
I'm using Lucene to spell check street names. Right now, I'm using Double Metaphone on the street name (we have a sophisticated regex to parse out the NAME as opposed to the unit, number, street type, or suffix). I think that Double Metaphone is probably overkill/wrong, and a spell checking appro

Maximum phrase query?

2007-07-30 Thread Max Metral
I have a set of tags associated with content in my corpus. I also have normal text. Our system tries to figure out which "words" are tags and which are text, and falls back on text when tags fail. I'm wondering, is there anything in Lucene which might help disambiguate multi-word tags from text?

RE: Problem using wildcardsearch in phrase search

2007-05-13 Thread Max Metral
Instead of using QueryParser.Parse, what if you make a WildcardQuery directly? I had similar troubles getting prefix queries (for ajax) working properly, never did solve it. -Original Message- From: Paul Taylor [mailto:[EMAIL PROTECTED] Sent: Sunday, May 13, 2007 5:02 AM To: java-user@lu

Help with Ajax-based prefix query?

2007-05-04 Thread Max Metral
Hi. I'm trying to design a proper index and query mechanism for looking up a business listing using an Ajax-style autocompleting text box. While I have gotten "versions" to work, I'm wondering what the optimal approach is. Someone may be looking for "Appleton Café." That listing might be