You can probably do this without plpgsql through liberal use of CTEs (WITH) and sub-queries.
Also look at arrayed types for "saving" matches and filtering out already tested pairs. David J. On Sep 19, 2011, at 10:37, Henry Drexler <alonup...@gmail.com> wrote: > Thanks you that is the kind of suggestion I was looking for - I will look > into plpgsql. > > Yes, there are several optimizations in it - though due to the actual data > the first few characters cannot be tested. Some of the actual optimizations > are only to reach out to the surrounding 100 rows and to skip numbers in the > characters. > > On Mon, Sep 19, 2011 at 10:17 AM, David Johnston <pol...@yahoo.com> wrote: > Look at this module for the actual comparison algorithms (found in Appendix F) > > > > “fuzzystrmatch” > > > > Performance would be my only concern but you have that issue either way. > With “plpgsql” you can do most things in the database you could do in VBA. > Whether you want to bog the DB down with a processor intensive process like > this is another question to consider. > > > > I am hoping you are putting in limits such as requiring that the first > character (or even first partial word) are equal before even checking for an > off-by-one error. With the “Levenshtein” algorithm you’d be looking for a > value of “1” to match your current behavior. > > > > In short, what you are doing (given your specification below) in VBA is also > doable in PostgreSQL. > > > > David J. > > > > > > From: pgsql-general-ow...@postgresql.org > [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Henry Drexler > Sent: Monday, September 19, 2011 9:10 AM > To: pgsql-general > Subject: [GENERAL] General guidance if there is an in dadabase solution or > should stay as excel vba solution. > > > > I have no problem doing this in excel vba, though as the list grows larger > obviously excel has row limits. > > > > > > What is being done: > > There is a column of data imported into the db - they are just > text strings, there are about 80,000 rows of them. The goal is to do a > single character elimination to find matches. > > > > so for instance the data is a bunch of rows of this: > > > > hello there > > what is your name > > happy birthday > > we are winner > > we are winners > > we like the sky > > task to do > > tasks to do > > > > so for the above in excel I created a macro that will remove one character > and compare and do this for each character of each text string. > > > > The final product: > > > > hello there > > what is your name > > happy birthday > > we are winner we are winners > > we are winners we are winner > > we like the sky > > task to do tasks to do > > tasks to do task to do > > > > > > so you can see that it found the matches with being one character off. > > > > > > Is this something best done outside of the db and in excel as I am doing or > is it possible to do it in db? > > > > Note I am not looking for someone to give a whole solution - just if they > know it can be done let me know the direction so I can research it and figure > it out. > > > > Any advice is welcome. > > > >