You can probably do this without plpgsql through liberal use of CTEs (WITH) and 
sub-queries.

Also look at arrayed types for "saving" matches and filtering out already 
tested pairs.

David J.

On Sep 19, 2011, at 10:37, Henry Drexler <alonup...@gmail.com> wrote:

> Thanks you that is the kind of suggestion I was looking for - I will look 
> into plpgsql.
> 
> Yes, there are several optimizations in it - though due to the actual data 
> the first few characters cannot be tested.  Some of the actual optimizations 
> are only to reach out to the surrounding 100 rows and to skip numbers in the 
> characters.
> 
> On Mon, Sep 19, 2011 at 10:17 AM, David Johnston <pol...@yahoo.com> wrote:
> Look at this module for the actual comparison algorithms (found in Appendix F)
> 
>  
> 
> “fuzzystrmatch”
> 
>  
> 
> Performance would be my only concern but you have that issue either way.  
> With “plpgsql” you can do most things in the database you could do in VBA.  
> Whether you want to bog the DB down with a processor intensive process like 
> this is another question to consider.
> 
>  
> 
> I am hoping you are putting in limits such as requiring that the first 
> character (or even first partial word) are equal before even checking for an 
> off-by-one error.  With the “Levenshtein” algorithm you’d be looking for a 
> value of “1” to match your current behavior.
> 
>  
> 
> In short, what you are doing (given your specification below) in VBA is also 
> doable in PostgreSQL.
> 
>  
> 
> David J.
> 
>  
> 
>  
> 
> From: pgsql-general-ow...@postgresql.org 
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Henry Drexler
> Sent: Monday, September 19, 2011 9:10 AM
> To: pgsql-general
> Subject: [GENERAL] General guidance if there is an in dadabase solution or 
> should stay as excel vba solution.
> 
>  
> 
> I have no problem doing this in excel vba, though as the list grows larger 
> obviously excel has row limits.
> 
>  
> 
>  
> 
> What is being done:
> 
>             There is a column of data imported into the db - they are just 
> text strings, there are about 80,000 rows of them.  The goal is to do a 
> single character elimination to find matches.
> 
>  
> 
> so for instance the data is a bunch of rows of this:
> 
>  
> 
> hello there
> 
> what is your name
> 
> happy birthday
> 
> we are winner
> 
> we are winners
> 
> we like the sky
> 
> task to do
> 
> tasks to do
> 
>  
> 
> so for the above in excel I created a macro that will remove one character 
> and compare and do this for each character of each text string.
> 
>  
> 
> The final product:
> 
>  
> 
> hello there
> 
> what is your name
> 
> happy birthday
> 
> we are winner              we are winners
> 
> we are winners                        we are winner
> 
> we like the sky
> 
> task to do                            tasks to do
> 
> tasks to do                   task to do
> 
>  
> 
>  
> 
> so you can see that it found the matches with being one character off.
> 
>  
> 
>  
> 
> Is this something best done outside of the db and in excel as I am doing or 
> is it possible to do it in db?
> 
>  
> 
> Note I am not looking for someone to give a whole solution - just if they 
> know it can be done let me know the direction so I can research it and figure 
> it out.
> 
>  
> 
> Any advice is welcome.
> 
>  
> 
> 

Reply via email to