Re: The Sort Problem

Uri Guttman Thu, 12 Feb 2004 12:57:48 -0800

>>>>> "PM" == Ph Marek <[EMAIL PROTECTED]> writes:


  >> ...
  >> so here is a (very rough and probably broken) syntax idea building on
  >> that:
  >> 
  >> sort :key { :descend :string .foo('bar').substr( 10, 3) }
  >> 
  >> :key {          :int .foo('baz') }
  >> :key {          :float .foo('amount') } @unsorted ;

  PM> I see a kind of problem here: If the parts of the key are not
  PM> fixed length but can vary you can put them in strings *only* after
  PM> processing all and verifying the needed length.

varying keys has no effect on this. you still build up a merged key of
whatever length is needed for each record. perl5 does that with pack or
.= or sprintf (all used in variants of the GRT).

  PM> Example:
  PM> sort :key { :descend :string .foo('bar') }
  PM>       :key {          :int .foo('baz') }
  PM>       :key {          :float .foo('amount') } @unsorted ;

  PM> Now .foo('bar') isn't bounded with any length - so you don't know how much 
  PM> space to reserve.

who needs to reserve space? you just let the guts do a realloc like p5
does now.

  PM> And I believe that 
  PM> - generating keys on every list element
  PM> - storing them into a array (array of array) and
  PM> - after having processed all checking the length, and
  PM> - now generate the to-be-sorted-strings 
  PM> - sort

  PM> isn't the optimal way.
  PM> BTW: this requires that *all* keys are generated.
  PM> In cases like
  PM> - by name,
  PM> - by age,
  PM> - by height,
  PM> - by number of toes left,
  PM> - and finally sort by the social security number

  PM> most of the extractions (and possibly database-queries of calculations or 
  PM> whatever) will not be done - at least in the current form of
  PM>   sort { $a->{"name"} cmp $b->{"name"} ||
  PM>            $a->{"age"} <=> $b->{"age"} || 
  PM>   ...

  PM> That is to say, I very much like the syntax you propose, but I'm
  PM> not sure if pre-generating *every* key-part is necessarily a
  PM> speed-up.

see the paper for more on that. the work you do in the callback, even if
you only do the first compare is O(N log N). in large sorts that will
outweigh the O(N) work on extracting the keys one time.

and as with all algorithms, real world cases will differ. short sorts
can be faster with complex comparisons because of the initial
overhead. maybe the internal code can decide to do that instead of a GRT
based on the sort array size. the key extract code is the same in both
cases so that would be an interesting optimization.

  PM> If there are expensive calculations you can always cut them short
  PM> be pre-calculating them into a hash by object, and just query this
  PM> in sort.

again, the goal is to eliminate work in the comparison. but as i said,
this is an implementation problem. we should focus on the language level
needs of the sort func and not how it is done internally. i shouldn't
have brought up the internal design here.

  PM> Also I fear that the amount of memory necessary to sort an array
  PM> of length N is not N*2 (unsorted, sorted), but more like N*3
  PM> (unsorted, keys, sorted), which could cause troubles on bigger
  PM> arrays ....

as larry said, virtual memory. and there are ways to reduce the copies
internally with pointers (something p5 couldn't do at p5 code level).

other than copying the extracted keys, you could leave the original
array alone and sort it in place by indexing it based on the sorted
pointers. and since the records will all be the same PMC type (usually),
you could just swap the array's pointers and not move any data
around. so no extra copies would be made other than key extraction.

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

Re: The Sort Problem

Reply via email to