Re: String representation

2000-12-21 Thread Nicholas Clark
On Thu, Dec 21, 2000 at 05:36:05PM +, Nick Ing-Simmons wrote: > Nicholas Clark <[EMAIL PROTECTED]> writes: > >> > >> where it is possible to get "smart" when one arg is a "special case" of > >> the other. > > > >> And similarly numbers must be convertable to "complex long double" or > >> wha

Re: String representation

2000-12-21 Thread Nick Ing-Simmons
Nicholas Clark <[EMAIL PROTECTED]> writes: >> >> where it is possible to get "smart" when one arg is a "special case" of >> the other. > >> And similarly numbers must be convertable to "complex long double" or >> what ever is the top if the built-in tree ? (NV I guess - complex is >> over-kill.)

Re: String representation

2000-12-21 Thread Nicholas Clark
On Wed, Dec 20, 2000 at 11:07:39PM +, Nick Ing-Simmons wrote: > The snag is that there are common pairs > e.g. concat(utf8,ascii) / concat(ascii,utf8) > or > plus(NV,IV) / plus(IV,NV) > > where it is possible to get "smart" when one arg is a "special case" of > the other. >

Re: String representation

2000-12-21 Thread Philip Newton
On 18 Dec 00, at 15:21, Nick Ing-Simmons wrote: > There needs to be a hierachy of _repertoires_ such that: > > ASCII is subset of Native is subset of wchar_t is subset of UNICODE. But we can't even rely on that. I can imagine a couple of Native encodings around that fiddle with ASCII (for exam

Re: String representation

2000-12-21 Thread Nick Ing-Simmons
Philip Newton <[EMAIL PROTECTED]> writes: >On 18 Dec 00, at 15:21, Nick Ing-Simmons wrote: > >> There needs to be a hierachy of _repertoires_ such that: >> >> ASCII is subset of Native is subset of wchar_t is subset of UNICODE. > >But we can't even rely on that. I can imagine a couple of Native

Re: String representation

2000-12-20 Thread Nick Ing-Simmons
David Mitchell <[EMAIL PROTECTED]> writes: >The problem is "what are the (types of) the arguments passed > >I dont really see why types af args are (in general) a problem. Hmm, you may be right at the level of your example, which may indeed be typical of pp_(). Perhaps PerlIO is so bother so

Re: String representation

2000-12-19 Thread Nicholas Clark
On Tue, Dec 19, 2000 at 06:11:06PM +, David Mitchell wrote: > Since in real life the types of args are often the same, this will usually > be a win. I found that you have to make an effort to make them the same, else generally enough of them aren't that decision making code outweighs speed ga

Re: String representation

2000-12-19 Thread David Mitchell
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote: > David Mitchell <[EMAIL PROTECTED]> writes: > >Nick Ing-Simmons <[EMAIL PROTECTED]> wrote: > >> What are string functions in your view? > >> m// > >> s/// > >> join() > >> substr > >> index > >> lc, lcfirst, ... > >> & | ~ > >> ++ > >>

Re: String representation

2000-12-18 Thread Kai Henningsen
p of that, and we can design > > other stuff in parallel with coding it. (A lot of it will be grunt work.) > > > > So, before we start even thinking about what we need, it's time to look at > > the vexed question of string representation. How do we do Unicode without > &

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
> >> As I pointed out on p5p even EBCDIC machines can use that model - but > >> the downside is that ord('A') == 65 which will breaks backward compatibility > >> with EBCDIC scripts. > > > >Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too? > > That was my suggestion last week

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
> At worst we have to write a "worst case" override entry for each op and > then work what it needs back - this is exemplified by PerlIO_getpos() > the "position" arg had to stop being an Fpos_t and become an SV * > so that stdio could stuff an Fpos_t in it, but a transcoding layer > could put th

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >On Mon, Dec 18, 2000 at 03:21:05PM +, Nick Ing-Simmons wrote: >> Simon Cozens <[EMAIL PROTECTED]> writes: >> > >> >So, before we start even thinking about what we need, it's time to look at the >>

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell <[EMAIL PROTECTED]> writes: >Nick Ing-Simmons <[EMAIL PROTECTED]> wrote: >> What are string functions in your view? >> m// >> s/// >> join() >> substr >> index >> lc, lcfirst, ... >> & | ~ >> ++ >> vec >> '.' >> '.=' >> >> It rapidly gets out of hand. > >Per

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Nicholas Clark <[EMAIL PROTECTED]> writes: >On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: > >> As painful as it may sound (codingwise) I would urge to spare some >> thought to using (internally) UTF-32 for those encodings for which >> UTF-8 would be *longer* than the UTF-32 (m

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell <[EMAIL PROTECTED]> writes: >> Personally I would not use such a beast > >But with different encodings implemented by different SV types - each with their >own vtable - surely most of this will "come out in the wash", by the correct >method automatically being called. I thought tha

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
On Mon, Dec 18, 2000 at 03:21:05PM +, Nick Ing-Simmons wrote: > Simon Cozens <[EMAIL PROTECTED]> writes: > > > >So, before we start even thinking about what we need, it's time to look at the > >vexed question of string representation. How do we do Uni

Re: String representation

2000-12-18 Thread David Mitchell
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote: > e.g. > >if (SvENCODING(sv_a) != SvENCODING(sv_b)) > { > if (SvENCODING(sv_a)->is_superset_of(SvENCODING(sv_b)) > { >sv_upgrade_to(sv_b,SvENCODING(sv_a)); > } > elsif if (SvENCODING(sv_b)->is_superset_of(SvENCODIN

Re: String representation

2000-12-18 Thread Nicholas Clark
On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: > As painful as it may sound (codingwise) I would urge to spare some > thought to using (internally) UTF-32 for those encodings for which > UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts). most CPUs can load a

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
On Mon, Dec 18, 2000 at 10:30:53AM -0500, Philip Newton wrote: > On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote: > > > On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote: > > > At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote: > > > > > > > >As painful as it may sound (codingwise) I wo

Re: String representation

2000-12-18 Thread David Mitchell
having split it into separate components, I might then make the case that certain of those components could be implemented as vtable ops (eg those components that are sensitive to the string representation). My dream would be that all knowledge related to utf8 (say) is contained in a file called

Re: String representation

2000-12-18 Thread Philip Newton
On Sun, 17 Dec 2000, Dan Sugalski wrote: > I'm thinking for speed that binary and UTF-32 should be our internal > representations, at least for the data that gets handed to the regex > engine. Or at least we use a constant-width character that's 8 and 32 bits, > if I'm misusing UTF-32. (UTF-8

Re: String representation

2000-12-18 Thread Philip Newton
On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote: > On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote: > > At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote: > > > > > >As painful as it may sound (codingwise) I would urge to spare some > > >thought to using (internally) UTF-32 for those

Re: String representation

2000-12-18 Thread Nicholas Clark
On Mon, Dec 18, 2000 at 02:43:14PM +, Nick Ing-Simmons wrote: > David Mitchell <[EMAIL PROTECTED]> writes: > > > >Personally I feel that that string part of the SV API should include most > >(if not all) string functions, including regex matching and substitution. [list of potential string op

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Simon Cozens <[EMAIL PROTECTED]> writes: > >So, before we start even thinking about what we need, it's time to look at the >vexed question of string representation. How do we do Unicode without getting >into the horrendous non-Latin1 cockups we're seeing on p5p right

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell <[EMAIL PROTECTED]> writes: > >Personally I feel that that string part of the SV API should include most >(if not all) string functions, including regex matching and substitution. What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... &

Re: String representation

2000-12-18 Thread David Mitchell
Simon Cozens <[EMAIL PROTECTED]> > IMHO, the first thing we need to design and code is the API and runtime > library, since everything else builds on top of that, and we can design other > stuff in parallel with coding it. (A lot of it will be grunt work.) Personally I feel that that string part

Re: String representation

2000-12-17 Thread Dan Sugalski
t; So, before we start even thinking about what we need, it's time to > look > > > at the > > > > vexed question of string representation. How do we do Unicode without > > > getting > > > > into the horrendous non-Latin1 cockups we're seeing on p5

Re: String representation

2000-12-16 Thread Jarkko Hietaniemi
rary, since everything else builds on top of that, and we can design > > other > > > stuff in parallel with coding it. (A lot of it will be grunt work.) > > > > > > So, before we start even thinking about what we need, it's time to look > > at the >

Re: String representation

2000-12-15 Thread Dan Sugalski
other > > stuff in parallel with coding it. (A lot of it will be grunt work.) > > > > So, before we start even thinking about what we need, it's time to look > at the > > vexed question of string representation. How do we do Unicode without > getting > > into the

Re: String representation

2000-12-15 Thread Jarkko Hietaniemi
an design other > > stuff in parallel with coding it. (A lot of it will be grunt work.) > > > > So, before we start even thinking about what we need, it's time to look at the > > vexed question of string representation. How do we do Unicode without getting > > into the

Re: String representation

2000-12-15 Thread Jarkko Hietaniemi
) > > So, before we start even thinking about what we need, it's time to look at the > vexed question of string representation. How do we do Unicode without getting > into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry As painful as it may sound (codingw

String representation

2000-12-15 Thread Simon Cozens
t the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry suggested aeons ago that everything is an array of numbers, and Perl shouldn't care what those numbers represent. But at some p