I freely admit this may be my problem. Writing regular expression patterns is 
more an art than a skill. However, I am getting an unexpected result from 
regex_replace().

I have a table that is partially defined as follows (names and email addresses 
hidden for privacy):

user_name       user_email
“AAAAA"                                                 
“BBBBB”         “bbbbb(xxxxx)"
“CCCCC”         "ccccct(home)"          
“DDDDD"                                                                 
“EEEEE"         "aeeeee(home)”

The second entry is an email address - bbbbb - followed by the name of an 
individual (xxxxx) in parentheses. The email address for CCCCC and EEEEE have 
the word “home” in parentheses appended to the email address.

I want to delete the parenthetical expression including the parentheses for all 
email addresses. I also have a column (not shown) called email_list that 
contains a comma separated list of all email addresses associated with each 
name or NULL if there is no list. I create a table:

CREATE TABLE "households_with_email" AS 
SELECT user_name, regexp_replace(user_email, E'\(.*\)', '') AS user_email, 
family_list, street_address, city, state, zip, phone_list, email_list
FROM "household_data"
WHERE email_list != ‘';

I expected the regex_replace to the parenthetical text with the null character. 
Instead, it replaces the whole string in user_email with the null string:

user_name       user_email
"Rodriguez”     ""
"Armstrong"     ""
"Bauer"         ""
"Berst"         ""
"Berst”         ""

I realize there may be some characteristic such as greedy matching that is 
causing this result, but if so, I don’t see how. The pattern indicates first 
find the ‘(‘ character, then match all characters until a ‘)’ character 
arrives. Those characters, including the parentheses should then be replaced 
with the null string.

Or am I misinterpreting the pattern?

Dan

Reply via email to