excellent - thank you again. On Mon, Sep 19, 2011 at 11:05 AM, David Johnston <pol...@yahoo.com> wrote:
> You can probably do this without plpgsql through liberal use of CTEs (WITH) > and sub-queries. > > Also look at arrayed types for "saving" matches and filtering out already > tested pairs. > > David J. > > > On Sep 19, 2011, at 10:37, Henry Drexler <alonup...@gmail.com> wrote: > > Thanks you that is the kind of suggestion I was looking for - I will look > into plpgsql. > > Yes, there are several optimizations in it - though due to the actual data > the first few characters cannot be tested. Some of the actual optimizations > are only to reach out to the surrounding 100 rows and to skip numbers in the > characters. > > On Mon, Sep 19, 2011 at 10:17 AM, David Johnston < <pol...@yahoo.com> > pol...@yahoo.com> wrote: > >> Look at this module for the actual comparison algorithms (found in >> Appendix F)**** >> >> ** ** >> >> “fuzzystrmatch”**** >> >> ** ** >> >> Performance would be my only concern but you have that issue either way. >> With “plpgsql” you can do most things in the database you could do in VBA. >> Whether you want to bog the DB down with a processor intensive process like >> this is another question to consider.**** >> >> ** ** >> >> I am hoping you are putting in limits such as requiring that the first >> character (or even first partial word) are equal before even checking for an >> off-by-one error. With the “Levenshtein” algorithm you’d be looking for a >> value of “1” to match your current behavior.**** >> >> ** ** >> >> In short, what you are doing (given your specification below) in VBA is >> also doable in PostgreSQL.**** >> >> ** ** >> >> David J.**** >> >> ** ** >> >> ** ** >> >> *From:* <pgsql-general-ow...@postgresql.org> >> pgsql-general-ow...@postgresql.org >> [mailto:<pgsql-general-ow...@postgresql.org> >> pgsql-general-ow...@postgresql.org] *On Behalf Of *Henry Drexler >> *Sent:* Monday, September 19, 2011 9:10 AM >> *To:* pgsql-general >> *Subject:* [GENERAL] General guidance if there is an in dadabase solution >> or should stay as excel vba solution.**** >> >> ** ** >> >> I have no problem doing this in excel vba, though as the list grows larger >> obviously excel has row limits.**** >> >> ** ** >> >> ** ** >> >> What is being done:**** >> >> There is a column of data imported into the db - they are >> just text strings, there are about 80,000 rows of them. The goal is to do a >> single character elimination to find matches.**** >> >> ** ** >> >> so for instance the data is a bunch of rows of this:**** >> >> ** ** >> >> hello there**** >> >> what is your name**** >> >> happy birthday**** >> >> we are winner**** >> >> we are winners**** >> >> we like the sky**** >> >> task to do**** >> >> tasks to do**** >> >> ** ** >> >> so for the above in excel I created a macro that will remove one character >> and compare and do this for each character of each text string.**** >> >> ** ** >> >> The final product:**** >> >> ** ** >> >> hello there**** >> >> what is your name**** >> >> happy birthday**** >> >> we are winner we are winners**** >> >> we are winners we are winner**** >> >> we like the sky**** >> >> task to do tasks to do**** >> >> tasks to do task to do**** >> >> ** ** >> >> ** ** >> >> so you can see that it found the matches with being one character off.*** >> * >> >> ** ** >> >> ** ** >> >> Is this something best done outside of the db and in excel as I am doing >> or is it possible to do it in db?**** >> >> ** ** >> >> Note I am not looking for someone to give a whole solution - just if they >> know it can be done let me know the direction so I can research it and >> figure it out.**** >> >> ** ** >> >> Any advice is welcome.**** >> >> ** ** >> > >