Re: The Sort Problem

Uri Guttman Wed, 11 Feb 2004 21:08:44 -0800

>>>>> "GNP" == Gregor N Purdy <[EMAIL PROTECTED]> writes:


  GNP> The Schwartzian is the usual way, of course:

  GNP>   map { $_->[1] }
  GNP>   sort { $^a[0] <=> $^b[0] }
  GNP>   map { [ $_.foo('bar').compute, $_ ] }
  GNP>   @unsorted;

and the GRT is another. 

let stick my nose in here. sorting is really composed of 3 parts,
extracting keys, comparing them and then getting back the records in
sorted order. the p5 sort block idioms require duplicating the extract
code and usually a binary operator. the GRT eliminates the binary
operator as it uses the default string compare in sort but it still
requires key extraction and then record extraction (similar to the first
map (last executed) in the ST.

  GNP> But we really aren't being concise here.

  GNP> (Aside) I wonder if there will be a new Perl 6 idiom to replace this
  GNP> using properties, something like this:

  GNP>   map { $_.really }
  GNP>   sort : numerically
  GNP>   map { $_.foo('bar').compute but really($_) }
  GNP>   @unsorted;

  GNP> or:

  GNP>   @unsorted
  GNP>     ==> map { $_.foo('bar').compute but really($_) }
  GNP>     ==> sort : numerically
  GNP>     ==> map { $_.really };

  GNP> But we *still* aren't concise.

that is the problem. the merging of the extracted key (in the ST or GRT)
with the record and the extraction of the record need to be hidden as
that is common code.

  GNP> Maybe we should have a two-block sort? The second block gives the
  GNP> transform to apply to the elements before applying the comparison
  GNP> block:

  GNP>   sort { $^a <=> $^b } { $_.foo('bar').compute } @unsorted;

duplicate args again.

  GNP> Or maybe even something like this:

  GNP>   sort :numerically :as { $^a.foo('bar').compute } @unsorted;

that looks best but how do you replace the :numerically comparison with
say a unicode aware string compare?

i have been working on a Sort::GRT in my head (finally!) and have some
ideas.

first, key extraction should work on $_ which is set for each element of
the unsorted list so there is no need for $^a. also how can you extract
multiple keys? and each key needs a ascending/descending option. and
each key needs a type so it can be compared correctly.

so here is a (very rough and probably broken) syntax idea building on
that:

sort :key { :descend :string .foo('bar').substr( 10, 3) }
     :key {          :int .foo('baz') }
     :key {          :float .foo('amount') } @unsorted ;

sort takes a list of :key's which each has a block (or whatever) that
specifies the key type, ordering and extraction code (that works on the
record in $_). the extraction code is in a block in itself so you can
declare lexicals and use multiple statements. the last value (in list
context) is the key. this is so you can do (p5 style) /foo(\d+)bar/ and
extract that digit string as the key.

side note: i was pondering the problem of where two different keys have
to be extracted from nearby places deep in a nested record
structure. this means duplicate code will be used to get into the deep
part. in my module, i was planning to allow a 'pre-extraction' piece of
code to be used and the value (e.g. a ref to the deep part) stored in a
lexical which can be used later in both key extractions. i don't know
any way to do that with the above syntax. there has to be a larger block
around all the :key's to allow something like that here. the p5 module
will be wrapping all the key extract code in a block so this would work
there.

internally, each record has its keys extracted and they are merged using
the GRT or ST or similar. in either case the record must be attached
(via a pointer or appended or some other way) to the merged key. the
keys are sorted and then the attached records are grabbed back and
returned. sorting in place can be done too.

what is good about the above syntax is that it removes all the common
code, removes the binary operator (order and type is all that matters)
and allows for a simple way to express multiple key extractions.

it can be implemented internally in c to be far faster than any perl or
parrot level code could do. the biggest hurdle i see is proper support
for unicode stuff. maybe another key attribute would be the collating
name? how this affects internal sorting is a question. can you generate
a merged key with numbers, ascii and unicode strings that will sort with
simple string compares? maybe the unicode extraction has to do a char by
char conversions to (32 bit?) integers using the collate sequence and
those numbers can easily be merged and sorted. note that only the
extracted key text needs to be converted so that shouldn't be too
expensive. so it may look like this:

sort :key { :descend :string :collate('iso_some_country') 
                          .foo('bar').substr( 10, 3) }
     :key {          :int .foo('baz') } @unsorted ;

what is nice about that is that each key could have its own collate
type. the :string attribute could be optional since the :collate implies
a string.

just trying to sort things out,

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

Re: The Sort Problem

Reply via email to