I have a need for a Perl routine to do a topological sort (a la the Unix
tsort command), and was unable to locate one in CPAN.  (The closest I got
was a perl version of the tsort command in the Perl Power Tools collection
by Jeffrey Haemer, but is not set up as a subroutine, and does not identify 
the cycles when found).
Anyhow, it was easy enough to write a routine to meet my requirements, with
some peaking at the GNU tsort C code, and I was considering contributing the
module to CPAN.  In interests of making the proposed module more useful, was
interested if anyone has feedback re the interface.  Still working on a good
set of tests for the module and docs, so it is a good point at which to make
changes in the API.

I was proposing a module Algorithm::TopologicalSort.

Currently consists of a single exported routine tsort, which accepts as input
a list pairwise orderings, and returns as list consisting of the number of
cycles found in the input, followed by a 'solution' to the topological sort.
The solution consists of nodes of the inputted set in an order consistent with
the given pairwise orderings.  If cycles were detected, it is reflected in the
cycle count, and at an appropriate place in the returned solution, instead of
a single node there is a ref to an anon array consisting of the nodes comprising
the cycle.  E.g., 
my ($cyc,@ordered) = tsort 
                'a'=>'b','b'=>'c','c'=>'d', 'd'=>'e','c'=>'b';
would return $cyc=1 and @ordered = ( 'a', ['b','c'], 'd', 'e' );
(actually, the order of 'b' and 'c' in anon array is unspecified).

The only option right now is a global $AlwaysUseAnonArrays which if set will
cause even singleton nodes to be put in an anon array, so in above example will
return @ordered = ( ['a'], ['b', 'c'], ['d'], ['e'])
This is primiarily intended for use if the nodes are not simple scalars (I have
not done anything else to make it work with other than simple scalars, but
believe Perl's stringification of references will allow such to work.)

The algorithm currently only does breadth first sorts, debating as to whether
should add a depth first option (e.g. whether 'a'=>'e','c'=>'d', 'a'=>'b'
should return 'a','c','b','d','e' or 'a','b','e','c','d').

In particular, looking for any advice/suggestions on making the input/output
more Perlish.  One idea was an alternate input format consisting of a list
of lists, so the 1st example could be simplified to
($cyc, @ordered) = tsort [ 'a','b','c','d','e' ] , ['c','b'];
with the first anon array being an abbreviated form for 'a'=>'b', 'b'=>'c', etc.
If we insist on simple scalar nodes, this could even be overloaded (and/or
allow a mixture of two formats, e.g., allow omission of anon array brackets
around ['c','b'] in above).

Any feedback appreciated.


Tom Payerle     
Dept of Physics                         [EMAIL PROTECTED]
University of Maryland                  (301) 405-6973
College Park, MD 20742-4111             Fax: (301) 314-9525

Reply via email to