On Tuesday, September 9, 2003, at 08:58 PM, perlwannabe wrote:

I have a text file that has various addresses in different formats. I
need to remove any items that are not part of the address so the output is
standard. Here is an example of the input file:


Address:<tab>1234<sp>Mockingbird<sp>Lane<tab>City:<tab>Groton<tab>State :<tab>CT
Address:<tab>2933<sp>Hummingbird<tab>St.<tab>City:<tab>Groton<tab>State :<tab>CT
Address:<sp>4321<tab>Sparrow<sp>Ave.<tab>City:<tab>Groton<tab>State:<ta b>CT
. . .


What I want to do is get all of the data between Address: and City: and
strip the <tab> and replace with spaces.  The only problem is that the
data between Address: and City: changes.  What I want in the end is:

Address:<tab>1234<sp>Mockingbird<sp>Lane<tab>City:<tab>Groton<tab>State :<tab>CT
Address:<tab>2933<sp>Hummingbird<sp>St.<tab>City:<tab>Groton<tab>State: <tab>CT
Address:<tab.4321<sp>Sparrow<sp>Ave.<tab>City:<tab>Groton<tab>State:<ta b>CT


(notice that any <tab> in the address itself is now a <sp> with <tab> both
before and after the address.)


I know it involves using the s/// operator to both strip the tabs and
replace with <sp> but the problem is that it requires using an array for
each address...and that is what is creating problems for me.


Thanks...

Hmm, let me think out loud a little.


I think I see a pattern, so let's first change all of them to spaces. That's easy enough:

s/\t/ /g;

Now, if we switch all the spaces that are supposed to be tabs to tabs, we're done, right. I bet we can handle that. What about:

s/ ([A-Za-z]:) /\t$1\t/g;
# and...
s/^([A-Za-z]:) /$1\t/;  # The first one is a special case

# or, most complex...

s/(^| )([A-Za-z]:) / length($1) ? "\t$1\t" : "$1\t" /eg;

You may have to adjust it a little if any of my assumptions are wrong, but it should be close to what you need, I think. Does that help any?

James


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to