On Sat, 25 Sep 2010 21:05:13 -0700 (PDT), Xah Lee <xah...@gmail.com> wrote:

>here's a interesting toy list processing problem.
>
>I have a list of lists, where each sublist is labelled by
>a number. I need to collect together the contents of all sublists
>sharing
>the same label. So if I have the list
>
>((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>r) (5 s t))
>
>where the first element of each sublist is the label, I need to
>produce:
>
>output:
>((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
[snip]

>anyone care to give a solution in Python, Perl, javascript, or other
>lang? am guessing the scheme solution can be much improved... perhaps
>using some lib but that seems to show scheme is pretty weak if the lib
>is non-standard.
>

Crossposting to Lisp, Python and Perl because the weird list of lists looks
like Lisp or something else, and you mention other languages so I'm throwing
this out for Perl.

It appears this string you have there is actually list syntax in another 
language.
If it is, its the job of the language to parse the data out. Why then do you
want to put it into another language form? At runtime, once the data is in 
variables,
dictated by the syntax, you can do whatever data manipulation you want
(combining arrays, etc..).

So, in the spirit of a preprocessor, given that the text is balanced, with 
proper closure,
ie:   ( (data) (data) )    is ok.
      ( data (data) )      is not ok.

the below does simple text manipulation, joining like labeled sublists, without 
going into
the runtime guts of internalizing the data itself. Internally, this is too 
simple.

-sln
-----------------
Alternate input:
(
  (
    (0 a b) (1 c d) (2 e f )
  )
  (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r) (5 s t)
)
------------------
use strict;
use warnings;

my $input = <<EOI;
((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r)
 (5 s t))
EOI
my $output = $input;

my $regxout = qr/
  ( (?: \( \s* [^()]+ \s* \) (\s*) )+ )
/x;


$output =~ 
s{ $regxout }
 {
    my ( $list, $format ) = ( $1, $2 );
    my ( %hseen,
         @order,
         $replace
    );
    while ($list =~  /\(\s* (\S+) \s* (.+?) \s*\)/xsg) {
        if ( exists $hseen{$1} ) {
            $hseen{$1} .= " $2";
            next;
        }
        push @order, $1;
        $hseen{$1} = $2;
    }
    for my $id (@order) {
        $replace .= "($hseen{$id}) ";
    }
    $replace =~ s/ $//;
    $replace . $format
 }xeg;

print "Input  -\n$input\n";
print "Output -\n$output";

__END__

Input  -
((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r)
 (5 s t))

Output -
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to