David Mitchell <[EMAIL PROTECTED]> writes:
>Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
>> What are string functions in your view?
>> m//
>> s///
>> join()
>> substr
>> index
>> lc, lcfirst, ...
>> & | ~
>> ++
>> vec
>> '.'
>> '.='
>>
>> It rapidly gets out of hand.
>
>Perhaps, but consider that somewhere within the perl internals there
>have to be functions which implement all these ops anyway. If we
>provide vtable slots for all these functions and just fill most of the
>slots with pointers to the 'default' Perl implementation, we havent
>really lost anything, except possibly a slight delay due to the extra
>indirection which that may be compensated for elsewhere). On the other
>hand, we have gained the ability to replace the default implementation
>with something more efficent where it suits us.
I have just been through exactly that process with the PerlIO stuff.
So I hope you will not take offence when I say that your observation above
is simplistic. The problem is "what are the (types of) the arguments passed
to the functions?" - the existing code will be expecting its args in
a particular form. So your wonderous new function must accept exactly
those args and types - and convert them as necessary before becoming
more efficient. So to get any win the args/types of all the functions
has to be designed with pluggable-ness in mind from the outset.
At best this means taking an indirection hit for all the args as well
as the function (this is what PerlIO does - PerlIO is now essentially
a FILE ** rather than a FILE *).
At worst we have to write a "worst case" override entry for each op and
then work what it needs back - this is exemplified by PerlIO_getpos()
the "position" arg had to stop being an Fpos_t and become an SV *
so that stdio could stuff an Fpos_t in it, but a transcoding layer
could put the Fpos_t, and the escape-state and partial characters in as
well.
>
>Take the example of substr() - if this is a standalone function, then
>it has to work without reference to any of the internals of its args,
>and thus has to rely on extracting a 'standard' representation of the
>string value from the SV in order to operate upon it. This then implies
>messiness of coding and inefficiency, with all the unicode hell that
>infects perl5 re-appearing. If substr() were a per-type op, then the
>messy details of UTF8 would lie almost completely within the internal
>implementation of that datatype.
True, but the messy details would now occur multiple times,
as soon as substr_utf8 exists then _ALL_ the other string ops
_must_ be overridden as well because nothing but string_utf8 "class"
knows what is going on.
>
>In fact, I would argue that in general most if not all the operations currently
>performed by pp_* should have vtable equivalents, both for numeric and string
>types (including unary ops, mutators, binops etc etc).
Hmm - that is indeed a logical position.
>
>> Seriously - I think we need to considr the original question
>> "What is the representation" based on perl5 hindsight, then think what
>> operations we want to perform on it, then divide those into the ones
>> which make sense to be "methods" (vtable entries) of string,
>> those that are part of string API, and those which are just ops messing
>> with strings.
>
>If an "op messing with strings" might be able to do a faster job given
>access to the internals of that string type, then I'd argue that that op
>should be in the vtable too.
I can see your position.
perl6 = Union_of(I32_perl, I64_perl, float_perl, double_perl, long_double_perl,
ASCII_perl, UTF8_perl, ShiftJis_perl,
Complex_rational_perl, right_to_left_perl,
)
or
class perl
{
virtual SV *add(SV *,SV *);
...
virtual SV *y(SV *,SV *);
}
The snag here is that the volume of code explodes and gets splattered
all over the sub-classes. So to fix a bug in the '+' operator (pp_plus)
one has to go visit lots of places - but, presumably, the bug will
only be in one of them.
If this is to fly (and I am not saying it cannot), then the
"multiple despatch" issue needs to have a clean process so that
it is clear what happens if someone writes:
my $complex_rational = $urdu_string / sqrt(-$big_integer);
The string needs to get converted to a number knowing which characters
are digits and what the Urdu for 'i' is. The big integer needs to get
negated (no sweat) then someone's sqrt() gets called and had better not
barf on the -ve value, then complex_rational can do the right thing.
In other words - string ops on strings of uniform type, math ops on
well understood hierachies etc. are all easy enough - it is the
combinations that get very messy very very quickly.
--
Nick Ing-Simmons