Re: [HACKERS] String Similarity

2006-09-27 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Sep 26, 2006 at 09:09:33AM +0800, Pang Zaihu wrote: > Hello! > Would you like to give me a simple introduction of Levenshtein distence > function? Better than I could explain: > Thank yo

Re: [HACKERS] String Similarity

2006-09-27 Thread Pang Zaihu
Hello! Would you like to give me a simple introduction of Levenshtein distence function? Thank you!    On 2006-05-19 19:54, Martijn van Oosterhout wrote: >  On Fri, May 19, 2006 at 04:00:48PM -0400, Mark Woodward wrote: >  >  (3) Is there also a desire for a Levenshtein distence function fo

Re: [HACKERS] String Similarity

2006-05-22 Thread Mark Woodward
> Try contrib/pg_trgm... Tri-graphs are interesting, and I'll try to reconsider whether they fit or not, ut I suspect that do not. (You are the second to recommend it) Anything based on a word parser is probably not appropriate, the example I first gave is a little misleading in that it is not th

Re: [HACKERS] String Similarity

2006-05-21 Thread Christopher Kings-Lynne
Try contrib/pg_trgm... Chris Mark Woodward wrote: I have a side project that needs to "intelligently" know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable. Think about this:

Re: [HACKERS] String Similarity

2006-05-20 Thread Mark Woodward
> What I was hoping someone had was a function that could find the substring > runs in something less than a strlen1*strlen2 number of operations and a > numerically sane way of representing the similarity or difference. Acually, it is more like strlen1*strlen2*N, where N is the number of valid r

Re: [HACKERS] String Similarity

2006-05-20 Thread Mark Woodward
> Get pg_trgm http://www.sai.msu.su/~megera/oddmuse/index.cgi/ReadmeTrgm > It doesn't depends on language. That's an interesting approach. This is what I got: apps$ ./stratest "pink floyd dark side of the moon money" "dark side of the moon pink floyd" Match: dark side of the moon Match: pink flo

Re: [HACKERS] String Similarity

2006-05-19 Thread Oleg Bartunov
Get pg_trgm http://www.sai.msu.su/~megera/oddmuse/index.cgi/ReadmeTrgm It doesn't depends on language. Oleg On Fri, 19 May 2006, Mark Woodward wrote: I have a side project that needs to "intelligently" know if two strings are contextually similar. Think about how CDDB information is collected a

Re: [HACKERS] String Similarity

2006-05-19 Thread Mark Woodward
> > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > >> I have a side project that needs to "intelligently" know if two strings >> are contextually similar. > > The examples you gave seem heavy on word order and whitespace > consideration, > before applying any algorithms. Here's a quick perl ve

Re: [HACKERS] String Similarity

2006-05-19 Thread Josh Berkus
> > I have a side project that needs to "intelligently" know if two > > strings are contextually similar. Also check out the "fuzzystrmatch" module in /contrib, which offers soundex, metaphone and levenschtein functions. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco --

Re: [HACKERS] String Similarity

2006-05-19 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > I have a side project that needs to "intelligently" know if two strings > are contextually similar. The examples you gave seem heavy on word order and whitespace consideration, before applying any algorithms. Here's a quick perl version that does

Re: [HACKERS] String Similarity

2006-05-19 Thread Mark Woodward
> Mark Woodward wrote: >> I have a side project that needs to "intelligently" know if two strings >> are contextually similar. Think about how CDDB information is collected >> and sorted. It isn't perfect, but there should be enough information to >> be >> usable. >> >> Think about this: >> >> "pin

Re: [HACKERS] String Similarity

2006-05-19 Thread Mark Dilger
Mark Woodward wrote: > I have a side project that needs to "intelligently" know if two strings > are contextually similar. Think about how CDDB information is collected > and sorted. It isn't perfect, but there should be enough information to be > usable. > > Think about this: > > "pink floyd - d

Re: [HACKERS] String Similarity

2006-05-19 Thread Andrew Dunstan
Mark Woodward wrote: (3) Is there also a desire for a Levenshtein distence function for text and varchars? I experimented with it, and was forced to write the function in item #1. fuzzystrmatch in contrib already has a Levenshtein function. cheers andrew ---(end

Re: [HACKERS] String Similarity

2006-05-19 Thread Martijn van Oosterhout
On Fri, May 19, 2006 at 04:00:48PM -0400, Mark Woodward wrote: > (3) Is there also a desire for a Levenshtein distence function for text > and varchars? I experimented with it, and was forced to write the function > in item #1. Postgres already has a Levenshtein distence function, see fuzzystrmatc

[HACKERS] String Similarity

2006-05-19 Thread Mark Woodward
I have a side project that needs to "intelligently" know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable. Think about this: "pink floyd - dark side of the moon - money" "dark s