On Fri, Jun 5, 2009 at 09:19, William <esia...@yahoo.com> wrote:
>
> I have been trying for hours, to make this data structure into hash, I need 
> help. Thanks.
>
> $str =
> "
>    (dr1
>        <predicate>foo
>        <1>(dr2 <predicate>bar)
>        <2>a
>    )
> ";
>
> $hash = {
>    "dr1" => {
>            "<1>" => {"dr2" => {"predicate"=> "bar"}},
>            "<2>" => "a"
>    }
> };
snip
> What could be the regular expression and algorithm to make the resulting hash 
> ?
snip

This isn't really a job for a regex.  Note that you have a recursive
structure (parentheses can contain parentheses).  When you have a
recursive structure you are moving out of the realm of regexes and
into the realm of parsers.  A good general purpose parser is
Parse::RecDescent[1], and you should learn to use it, but your example
is simple enough that we don't need it.  We can tokenize the stream
fairly easily by splitting on '(', ')', '<', '>', and whitespace, then
throwing out the whitespace characters[2].  Once we have the tokens we
deal with each () group recursively:

#!/usr/bin/perl

use strict;
use warnings;

my $s = "
   (dr1
       <predicate>foo
       <1>(dr2 <predicate>bar)
       <2>a
   )
";

my @tokens = grep { /\S/ } split /([()<>\s])/, $s;

my $hash = recurse(\...@tokens);

use Data::Dumper;
print Dumper $hash;

sub recurse {
        my $tokens = shift;
        my $paren  = shift @$tokens;
        my $key    = shift @$tokens;
        die "invalid structure" unless $paren eq '(';

        my %hash;
        while (@$tokens) {
                my $start  = shift @$tokens;
                my $subkey = shift @$tokens;
                my $end    = shift @$tokens;
                die "invalid structure" unless $start eq '<' and $end eq '>';

                if ($tokens->[0] eq '(') {
                        $hash{$key}{$subkey} = recurse($tokens);
                } else {
                        $hash{$key}{$subkey} = shift @$tokens;
                }

                if ($tokens->[0] eq ')') {
                        shift @$tokens;
                        return \%hash;
                }
        }

        die "invalid structure";
}

1. http://search.cpan.org/dist/Parse-RecDescent/lib/Parse/RecDescent.pm
2. if whitespace is allowed between '<' and '>', this will need to
change slightly

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to