Re: Counting the number of times a string matches in another string

Rob Dixon Thu, 13 Mar 2003 11:36:19 -0800

Scott E Robinson wrote:
> Okay, since i wasn't clear the first time, let me try again.  Sorry,
> I'm not a professional programmer, I'm a true beginner.


Don't apologise for not being a prefessional programmer: some
would consider it a thing to be proud of :-)

>
> SHORT VERSION:
>
> What I want is Joseph's option 2b, if I understand him correctly.
> Given a
> string of the form
>
> > M260:
>
> I want to get a count of its occurrences in a single other string of
> the form
>
> > L000:W000:M260:B271:8:A:
>
> (Incidentally, the first string, :M260:, came from a line like
> > L121:M260:B250:L000:, but saying so probably only confused the
> > issue.
> Since I want to compare each colon-delimited "chunk" of the source string
> to the target string, I assume I will need to loop through each chunk in
> the source string, comparing each "chunk" to the target string.)

[snip long version]

The way I understand this is that you have a list of names which
are stored as a pseudo-phonetic sequence of components. GIven
a key name represented the same way, you want to calculate
values indicating how well each member of the list matches
the key. This is to be done by counting the number of
components of the key which appear in each list member.

Assuming this is correct, it helps a lot. You can use the
alternation capability of regexes to find any one of the
components of the key. Given a key like

    my $key = ':L000:W000:M260:B271:8:A:';

you can find the number of its components in $test with

    my $count = () = ($line =~/\b(L000|W000|M260|B271|8|A)\b/g);

Now all you need to do is to build the regex from the original
key, like this

    my $key = ':L000:W000:M260:B271:8:A:';
    my $regex = join '|', ($candidate =~ m/\w+/g);
    my $count = () = ($line =~/\b($regex)\b/g);

and it's done. A complete script which uses my previous attempt
as a basis follows. Come back if there's anything you're not clear
about.

Just one thing though. I would make sure
you can get this working as it is before you start improving
the capability of the algorithm with things like a second pass.
I imagine you will want to add a lot of tweaks before you're
done, and it will make your life easier if you adjust a working
program rather than aiming straight away for the end product.

HTH,

Rob


#perl

use strict;
use warnings;

my @target = map { s/\s.*//s; $_ } <DATA>;
close DATA;

my $candidate = ':B000:W000:M260:8:';

my $regex = join '|', ($candidate =~ m/\w+/g);

foreach (@target) {
    my $count = () = /\b($regex)\b/g;
    print "$_ => $count\n";
}

__DATA__
:L520:T400:C000:S000:L200:8:          <-bare numbers are possible, like this :8:
:L520:T400:C000:S000:L200:8:
:L520:T400:C000:S000:L200:24:E214:
:L520:T400:C000:S000:M:24:E214:       <-note the :M: string, just for variety
:L520:T400:C000:S000:L200:14:E214:
:L520:T400:C000:S000:L200:14:E214:
:L520:M260:C000:S000:L200:14:E214:    <-this should match once
:L520:T400:M260:S000:M260:14:E214:    <-this should match twice
:L520:T400:C000:S000:L200:14:E214:




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Counting the number of times a string matches in another string

Reply via email to