I have a text file that has various addresses in different formats. I
need to remove any items that are not part of the address so the output is
standard. Here is an example of the input file:
Address:<tab>1234<sp>Mockingbird<sp>Lane<tab>City:<tab>Groton<tab>State :<tab>CT
Address:<tab>2933<sp>Hummingbird<tab>St.<tab>City:<tab>Groton<tab>State :<tab>CT
Address:<sp>4321<tab>Sparrow<sp>Ave.<tab>City:<tab>Groton<tab>State:<ta b>CT
. . .
What I want to do is get all of the data between Address: and City: and strip the <tab> and replace with spaces. The only problem is that the data between Address: and City: changes. What I want in the end is:
Address:<tab>1234<sp>Mockingbird<sp>Lane<tab>City:<tab>Groton<tab>State :<tab>CT
Address:<tab>2933<sp>Hummingbird<sp>St.<tab>City:<tab>Groton<tab>State: <tab>CT
Address:<tab.4321<sp>Sparrow<sp>Ave.<tab>City:<tab>Groton<tab>State:<ta b>CT
(notice that any <tab> in the address itself is now a <sp> with <tab> both
before and after the address.)
I know it involves using the s/// operator to both strip the tabs and
replace with <sp> but the problem is that it requires using an array for
each address...and that is what is creating problems for me.
Thanks...
Hmm, let me think out loud a little.
I think I see a pattern, so let's first change all of them to spaces. That's easy enough:
s/\t/ /g;
Now, if we switch all the spaces that are supposed to be tabs to tabs, we're done, right. I bet we can handle that. What about:
s/ ([A-Za-z]:) /\t$1\t/g; # and... s/^([A-Za-z]:) /$1\t/; # The first one is a special case
# or, most complex...
s/(^| )([A-Za-z]:) / length($1) ? "\t$1\t" : "$1\t" /eg;
You may have to adjust it a little if any of my assumptions are wrong, but it should be close to what you need, I think. Does that help any?
James
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]