>>>>> "DC" == Damian Conway <[EMAIL PROTECTED]> writes:

  DC> Uri wrote:

  DC> @out = sort
  DC> [ { ~ %lookup{ .{remotekey} } },                                 #1
  >> if string cmp is the default, wouldn't that ~ be redundant?

  DC> How do you know that the values of %lookup are strings?
  DC> How would the optimizer know?

because that would be the default comparison and the extracted key value
would be stringified unless some other marker is used. most sorts are on
strings so this would be a useful huffman and removal of a redundancy.

  DC> { + substr( 0, 10 ) },                                         #3
  DC> { int /foo(\d+)bar/ },                                         #4
  >> i would also expect int to be a default over float as it will be used
  >> more often. + is needed there since the regex returns a string. in the
  >> #3 case that would be an int as well. so we need a 'float' cast
  >> thingy.

  DC> Unary C<+> *is* the "float cast thingy"!

hmm, so + is float but int is needed otherwise? int is more commonly a
sort key than float. it just feels asymetrical with one having a symbol
and the other a named operator.

  DC> If you want to force numeric comparison of keys you explicitly cast
  DC> each key to number using unary C<+> or C<int>. If you want to force
  DC> stringific comparison you explicitly cast the key to string using
  DC> unary C<~>.

or ~ is the default if nothing else is specified. it matches up with cmp
being the default comparison as you agreed above.

  DC> If you don't explicitly cast either way, C<sort> just DWIMs by looking
  DC> at the actual type of the keys returned by the extractor. If any of
  DC> them isn't a number, it defaults to C<cmp>.

that doesn't work for me. the GRT can't scan all keys before it
decides on what comparison operator is needed. the GRT needs to merge
keys as they are extracted and it needs to do the correct conversion to
a byte string then and not later. you can't DWIM this away if you want
the optimization. the ST can get away with it since you are still using
a compare block even if it is generated internally by the sort function.


  DC> { just_guess $^b, $^a },                                       #7
  >> is that a reverse order sort? why not skip the args and do this:
  >> { &just_guess is descending },
  >> #7

  DC> Because I wanted to show a plain old two-parameter block being used as
  DC> a *comparator* (not a one-parameter block being used as a key
  DC> extractor).

that seems like extra redundant code just to mark a callback vs a
extract block. there should be a cleaner way to do that. maybe a null
extract key in the pair?

        { '' => &just_guess }
        { undef => &just_guess }        # will => autoquote undef there?

then the records are full keys with no special extraction but there is a
callback comparator. no need to declare any arguments to the callback
sub here since you know it is that and not key extract code.

  >> but what about this odd case,
  >> sort [...], [...], [...]
  >> now that is stupid code but it could be trying to sort the refs by
  >> their
  >> address in string mode.

  DC> In which case we probably should have written it:

  DC>   sort <== [...], [...], [...]

i did ask in another post whether <== or ==> would fit in here. so that
line forces it all to be data and my silly example has the first anon
list of criteria and the rest as data. works for me so far.

  >> or it could be a sort criteria list followed by
  >> 2 refs to input records.

  DC> Only if the first array ref contains nothing but Criterion objects.

but what defines them as criteria objects? they look just like records
which could be sorted as well. i don't see any syntactical markers that
{} means criteria block vs a hash ref. DWIM guessing isn't good enough
for me here. sort in p5 already had some issues with that IIRC.

  >> as i pointed out above, i don't see why
  >> you even need to show the ^$a and ^$b args?

  DC> So the block knows it has two parameters.

but the callback sub knows it has two params since it has to be written
that way. sort always calls the code block with 2 params.

  >> they will be passed into just_guess that way. let is descending
  >> handle the sort ordering.

  DC> But you *can't* apply C<is descending> to a Code reference.

then how did you apply 'is insensitive'? aside from how it is done
(traits and such), we need a syntax that conveys the semantics of
insensitive and descending into the sort func. it will use that info to
reverse the key order to comparison subs or modify the key merging of
the GRT to effect those flags. what i am saying is i think that you need
to go back to the drawing board to find a clean syntax to mark those
flags. note that neither the code block nor the callback needs to see
them, only the sort guts ever needs to see them. so we are communicating
info to the sort about this key. the code block/callback only ever sees
two arguments to compare and nothing else. maybe this will clarify for
you the intentions of those flags and why they have nothing to do with
the code block but rather describe the behavior of this key.

  DC> Nor are we sure that the order *is* descending. Maybe the
  DC> C<just_guess> predicate is invariant to argument order and there
  DC> were other reasons to pass the args in that order. Or maybe we
  DC> reversed the order because we know that in C<just_guess> never
  DC> returns zero, but defaults to its second argument being smaller,
  DC> in which case we needed to reverse the args so that the C<sort>
  DC> remained stable.

i just don't like the reverse args concept at all. it is not
semantically useful to sorting. sorts care about ascending and
descending, not a vs b in args.

  DC> The point is that I wanted to show a vanilla two-parameter compare
  DC> block.  (And, boy, am I ever sorry I whimsically reversed the args
  DC> to indicate generality ;-)

but i am glad you did since it brings up this issue. i don't think we
should allow any args there at all as they are not needed. compare
blocks get called with 2 args. some sort of descending marker reverses
the args before the call. also in the GRT, descending causes special data
munging so the keys will sort in descending order so that has to be
passed on to the guts somehow.

  DC> @sorted = sort {-M} @unsorted;
  >> that still wants to be cached somehow as -M is expensive.

  DC> It *is* cached. It's a one-parameter block. So its a key extractor. So
  DC> it automagically caches the keys it extracts.

??? who and what caches it? it will get called again and again for the
same record. where is the definition that 1 param code blocks do
caching? do they all do that?

  >> assuming no internal caching

  DC> Key extractors will always cache.

so this is a key extraction feature about caching. the ST and GRT don't
need caching so that is a waste if those are used. only a orcish sort
needs caching. 

  >> @sorted = sort {%M{$_} //= -M} @unsorted;
  >> i assume //= will be optimized and -M won't be called if it is
  >> cached.
  >> also where does %M get declared and/or cleared before this?

  DC> Exactly the problem. That's why key extractors aways cache.

but they can't always cache. it depends on the implementation and
possibly at runtime (selecting orchish, ST or GRT based on some other
criteria).

  >> can it be
  >> done in the block:
  >> @sorted = sort {my %M ; %M{$_} //= -M} @unsorted;

  DC> If you'd gone insane and particularly wanted to do it that way, you'd
  DC> need something like:

  DC>           @sorted = sort {state %M ; %M{$_} //= -M} @unsorted;

  DC> to ensure the cache persisted between calls to the key extractor.

and will that get cleared before each sort?

  >> another -M problem is that it currently returns a float so that must be
  >> marked/cast as a float.
  >> @sorted = sort {float -M} @unsorted;

  DC> No. *Because* -M returns a number, C<sort> automatically knows to use
  DC> numeric comparison on those keys.

  >> maybe the fact that the compiler knows -M returns a float can be used to
  >> mark it internally and the explicit float isn't needed here.

  DC> Exactly.

ok for this case but not for data from a record.

  >> but data
  >> from a user record will need to be marked as float as the compiler can't
  >> tell.

  DC> It *can* tell if the elements are typed. But, yes, most of the time if
  DC> you want to ensure numeric comparison you will explicitly prefix with
  DC> a C<+> to give the compiler a hint. Otherwise C<sort> will have to
  DC> fall back on looking at the keys that are extracted and working out at
  DC> run-time which type of comparison to use (kinda like the smartmatch
  DC> operator does).

only in the ST. the orcish does its compares on the fly without a full
prescan. and the GRT can't prescan as it mungs and merges keys on the
fly.

  >> anyhow, i am glad i invoked your name and summoned you into this
  >> thread. :)

  DC> Well, that makes *one* of us.

  DC> ;-)

when are you going to get your life sorted out? :)

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

Reply via email to