Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-15 Thread Rémi Cura
A little related bonus : when doing the time-join, the next step is to interpolate to have a more accurate estimation : --- DROP FUNCTION IF EXISTS range_interpolate(nr anyrange,obs anyelement) ; CREATE OR

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-14 Thread Rémi Cura
2014-04-12 15:04 GMT+02:00 Andy Colson : > On 04/12/2014 06:29 AM, Rémi Cura wrote: > >> (please note that this random string function is NOT the good way to >> do it, i should random int then use it as index to an array >> containing all the letter) >> >> Thanks a lot for this new version! It see

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-12 Thread Andy Colson
On 04/12/2014 06:29 AM, Rémi Cura wrote: (please note that this random string function is NOT the good way to do it, i should random int then use it as index to an array containing all the letter) Thanks a lot for this new version! It seems to be slower than your first solution (no index use I g

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-12 Thread Rémi Cura
(please note that this random string function is NOT the good way to do it, i should random int then use it as index to an array containing all the letter) Thanks a lot for this new version! It seems to be slower than your first solution (no index use I guess, I gave up after 5 minutes vs 5 sec fo

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-11 Thread Rémi Cura
Wow many thanks! I had thought about the order by and limit because it is the natural way to express the problem, but I had discarded it for fear of suchbad complexity (theoretically, for each row of B , compute the distance to every other row of A!) . And it's okay if 2 row from B share the same

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-11 Thread Andy Colson
On 4/11/2014 12:16 PM, Andy Colson wrote: On 4/11/2014 7:50 AM, Rémi Cura wrote: Hey dear List, I'm looking for some advice about the best way to perform a "fuzzy" join, that is joining two table based on approximate matching. It is about temporal matching given a table A with rows containing

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-11 Thread Andy Colson
On 4/11/2014 7:50 AM, Rémi Cura wrote: Hey dear List, I'm looking for some advice about the best way to perform a "fuzzy" join, that is joining two table based on approximate matching. It is about temporal matching given a table A with rows containing data and a control_time (for instance 1 ; 5

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-11 Thread Andy Colson
2014-04-11 17:09 GMT+02:00 Andy Colson mailto:a...@squeakycode.net>>: On 4/11/2014 7:50 AM, Rémi Cura wrote: Hey dear List, I'm looking for some advice about the best way to perform a "fuzzy" join, that is joining two table based on approximate matching. It

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-11 Thread Rémi Cura
Hey, thanks for your answer. I think you are right, range type with index could at least provide a fast matching, thus avoiding the numrow(A) * numrow(B) complexity . Though I don't see how to use it to interpolate for more than 1st order. Cheers, Rémi-C 2014-04-11 17:09 GMT+02:00 Andy Colson

Re: [GENERAL] efficient way to do "fuzzy" join

2014-04-11 Thread Andy Colson
On 4/11/2014 7:50 AM, Rémi Cura wrote: Hey dear List, I'm looking for some advice about the best way to perform a "fuzzy" join, that is joining two table based on approximate matching. It is about temporal matching given a table A with rows containing data and a control_time (for instance 1 ; 5