Re: UTF-8 and Unicode FAQ, demos
Damian Conway wrote: Larry Wall wrote: That suggests to me that the circumlocution could be >>*<<. A five character multiple symbol??? I guess that's the penalty for not upgrading to something that can handle unicode. Unless this is subtle humor, the Huffman encoding idea is getting seriously out of hand. That 5 char ASCII sequence is *identically* encoded when read by the human eye. Humans can probably type the 5 char sequence faster too. How does Unicode win here? I know I'm just another sample point in a sea of samples, but my embedded symbol parser seems optimized for alphabetic symbols. The cool non-alphabetic Unicode symbols are beautiful to look at, but they don't help me read or write faster. There are rare exceptions (like grouping) where I strongly prefer non-alphabetics, but otherwise alphabetics help me get past the "what is this code?" phase and into the "what does this code do?" phase as quickly as possible. (I just noticed that all the non-alphabetic symbols (except '?') in the previous paragraph are used for grouping. Weird.) - Ken
RE: UTF-8 and Unicode FAQ, demos
Ken Fox wrote: > Damian Conway wrote: > > Larry Wall wrote: > >> That suggests to me that the circumlocution could be >>*<<. > > > > A five character multiple symbol??? I guess that's the > > penalty for not upgrading to something that can handle > > unicode. > > Unless this is subtle humor, the Huffman encoding idea is > getting seriously out of hand. That 5 char ASCII sequence > is *identically* encoded when read by the human eye. Humans > can probably type the 5 char sequence faster too. How does > Unicode win here? > > I know I'm just another sample point in a sea of samples Can't we have our cake and eat it too? Give ASCII digraph or trigraph alternatives for the incoming tide of Perl6 Unicode? Allow both >>*<< and »*«? Or something similar '>>*'<<, [>*<], etc... -- Garrett Goebel IS Development Specialist ScriptPro Direct: 913.403.5261 5828 Reeds Road Main: 913.384.1008 Mission, KS 66202 Fax: 913.384.2180 www.scriptpro.com [EMAIL PROTECTED]
RE: UTF-8 and Unicode FAQ, demos
Garrett Goebel: # Ken Fox wrote: # > Unless this is subtle humor, the Huffman encoding idea is getting # > seriously out of hand. That 5 char ASCII sequence is *identically* # > encoded when read by the human eye. Humans can probably type the 5 # > char sequence faster too. How does Unicode win here? # # Can't we have our cake and eat it too? Give ASCII digraph or # trigraph alternatives for the incoming tide of Perl6 Unicode? The Unicode version is more typing than the non-Unicode version, so what's the advantage? It's prettier? --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) Wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. And radio operates exactly the same way. The only difference is that there is no cat. --Albert Einstein (explaining radio)
How to set your Windows keyboard to ¶erl-mode
This > ¶ < is a pilchrow, which shows up for me as one of those paragraph-sign looking backwards P's with two vertical bars. Sorry if it doesn't come out for you. --- Brent Dax <[EMAIL PROTECTED]> wrote: > The Unicode version is more typing than the non-Unicode version, so > what's the advantage? It's prettier? If you're a MAC user, you already know from the myriad responses pointing at option-whatever how to generate the sequences. If you're a PC/DOS user, you deserve whatever you get. But your text-editor may support macros. I used Multi-Edit when I was DOSsing, and it did. Ask your manufacturer. If you're using OS/2, I'm sorry. The 850 codepage supports the characters « and », but I don't know how to help you generate them other than alt-123 If you're using X, and you don't already know how to generate these characters, RTFM: xmodmap. If you can't make it work, your license to use Linux will be revoked. Nobody who can't use xmodmap should be allowed to own a keyboard. If you're using Windows, specific reference is made to the following URL: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q306560 I quote (using alt-[ and alt-], which now do « and » for me, respectively): «« Adding the United States-International Keyboard Layout To add the United States-International keyboard layout, follow these steps: Click Start, and then click Control Panel. Under Pick a category, click Date, Time, Language, and Regional Options. The Regional and Language Options dialog box appears. On the Languages tab, click Details. The Text Services and Input Languages dialog box appears. Under Installed services, click Add. The Add Input language dialog box appears. In the Input language list, click the language that you want. For example, English (United States). NOTE: When you use the United States-International keyboard layout, you should also use an English language setting. In the Keyboard layout/IME list, click United States-International, and then click OK. In the Select one of the installed input languages to use when you start your computer list, click Language name - United States-International (where Language name is the language that you selected in step 6), and then click OK. In the Regional and Language Options dialog box, click OK. Notice that the Language bar appears on the taskbar. When you position the mouse pointer over it, a ToolTip appears that describes the active keyboard layout. For example, United States-International. Click the Language bar, and then click United States-International on the shortcut menu that appears. The United States-International keyboard layout is selected. »» NOW: At this point, Meestaire ISO-phobic Amairecain Programmaire, you have achieved keyboard parity with the average Swiss six-year-old child. Don't let this happen again. We can't afford a keyboard-gap! Yes, living below the earth in salt mines, with a properly chosen ratio of nubile females to fertile males, say ten for every one, it will be possible to ... erk. sorry. (*) The URL goes on to list all the cool keys you can generate, but two things: 1- You've probably got a little [EN] icon in your taskbar. Make sure to switch to the international flavor (either using leftALT-Shift in a window, or by mousing the [EN] taskbar icon) before pounding away at your new ¶erlified keyboard. 2- The IME only uses rightALT for key composition. Those of you whose right thumbs have been amputated in freak mine accidents will have to pursue more accessibility features. =áµßþíñ (International Man of Mystery) * -- Speaking of Dr. Strangelove, happy 50th birthday to "Mike." November 1, 1952, saw the detonation of "Mike", the worlds first hydrogen bomb. "Elugelab, the Pacific island on which Mike exploded, was erased by the blast. When told that Elugelab was 'missing', America's president-elect, Dwight Eisenhower, visibly paled." (Economist) Here's to 50 years of unemployment. So far. __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
On Monday, November 4, 2002, at 08:55 AM, Brent Dax wrote: # Can't we have our cake and eat it too? Give ASCII digraph or # trigraph alternatives for the incoming tide of Perl6 Unicode? The Unicode version is more typing than the non-Unicode version, so what's the advantage? It's prettier? Well, yes! :-)... but also because they are unique characters compared to all the other existing prefix/postfix/binary/quotelike operators, so there pretty much zero chance of ambiguity. Using just a few Unicode symbols would seriously open up the range of possible "sensible" operators, without causing the kind of mind-numbing ambiguities and subtle no-not-this-I-mean-that we've seen in the whole xor/hyper discussions. UTF-8 «op» representations have the advantage of trivially not conflicting with _any_ existing operators, and being visually distinct from all of them. There may be a few other things in easy-to-find-and-type Latin1, like one or two of these: ⢠â â« â ® © § â Ω â ¶ ⡠± Ë Â¿ That could maybe fill in for ';' in the cases where ';' has been given a sneaky meaning, or represent some infrequent but terrifically useful unary or binary op, etc. C'mon, everybody's doing it! First one's free, kid... ;-) MikeL
Re: UTF-8 and Unicode FAQ, demos
On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote: > Matthew Zimmerman wrote in perl.perl6.language : > > > > So let me make my original question a little more > > general: are Perl 6 source files encoded in Latin-1, > > UTF-8, or will Perl 6 provide some sort of translation > > mechanism, like specifying the charset on the command > > line? > > I expect probably something similar to Perl 5's encoding > pragma. (But hopefully lexically scoped.) Okay, but what will the default be? UTF-8? iso-8859-1? My current locale? Am I going to have put use encoding 'utf8'; # or whatever the P6 syntax will be at the beginning of every program that might get distributed outside of my home country to make sure it'll run? Are we going to tell newbies to make sure they have '-w' and 'use strict' *and* 'use encoding' at the beginning of their programs? I'm just worried about the possibility of writing Perl 6 programs and then sending them to friends in other parts of the world and having them fail in subtle ways because my Perl 6 expects 0xAB and theirs expects 0xC2AB (or visa versa). Or if I post a code sample to CLPM that runs on my machine that doesn't compile from the posting because my news client automatically converts charsets. Undoubtedly the Perl 6 parser will be smart enough to figure out all of this, and I'm making a mountain out of a molehill. But I just want to make sure that one of the people in authority here either is or will be thinking about this. -- Matt Matthew Zimmerman Interdisciplinary Biophysics, University of Virginia http://www.people.virginia.edu/~mdz4c/
Re: UTF-8 and Unicode FAQ, demos
--- Matthew Zimmerman <[EMAIL PROTECTED]> wrote: > On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote: > > Matthew Zimmerman wrote in perl.perl6.language : > > > > > > So let me make my original question a little more > > > general: are Perl 6 source files encoded in Latin-1, > > > UTF-8, or will Perl 6 provide some sort of translation > > > mechanism, like specifying the charset on the command > > > line? > > > > I expect probably something similar to Perl 5's encoding > > pragma. (But hopefully lexically scoped.) > > Okay, but what will the default be? UTF-8? iso-8859-1? My > current locale? Am I going to have put > > use encoding 'utf8'; # or whatever the P6 syntax will be > > at the beginning of every program that might get distributed > outside of my home country to make sure it'll run? 8859-1 will be the default. If you want "trigraph" support, you'll have to put use encoding 'ugly-american'; at the top of your files. ;-) ;-) ;-) Otherwise, it'll be one-character «fancyops» all the way. =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: vectorization (union and intersection operators)
> I'm probably opening up a whole new can of worms here, but if we said > that the following were both vector operators: > > ^ == intersection operator > v == union operator > > then these could have potentially useful meanings on their *own* as set > operators, as well as modifying other operators. For example: > > @a = (1,2,3); > @b = (4,1,3); > > @a = @a ^ @b; # @a = (1,3); > @a = @a v @b; # @a = (1,2,3,4); Or is @a = (1,2,3,4,1,3) ? No... I'm assuming two things: 1) that v (union) uniqifies the elements in its array, as does ^. 2) that the v (union) modifier includes elements that exist in either the first set or the second set, and that ^ includes elements that exist in both sets. These are pragmatic operators; I've done this type of thing *exceedingly* often. The behaviour you describe is the concatenation of the two sets. I can get that for free by saying push(@a, @b), or @a = (@a, @b); And since admittedly they are pragmatic, people are welcome to overload them. But right now as it stands '@a v= @b' is nonsense; it might as well be put to work. Ed ( ps - as an aside, are the apocalypses going to be backdated as changes to the design come up? Or are the apocalypses just a first draft for more enduring documentation? )
Re: vectorization (union and intersection operators)
Ed Peschko asked: ps - as an aside, are the apocalypses going to be backdated as changes to the design come up? Yes. Or are the apocalypses just a first draft for more enduring documentation? Yes. ;-) Damian
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 10:19:55AM -0800, Michael Lazzaro wrote: > UTF-8 «op» representations have the advantage of trivially not > conflicting with _any_ existing operators, and being visually distinct > from all of them. There may be a few other things in > easy-to-find-and-type Latin1, like one or two of these: > > ⢠â â« â ® © § â Ω â ¶ ⡠± Ë Â¿ I've actually got my eye on â (U+2248 ALMOST EQUAL TO) as a replacement for ~~ someday in the distant future. I suppose it could be argued that we should use â (U+2245 APPROXIMATELY EQUAL TO) instead. That's what =~ was supposed to represent, after all... > That could maybe fill in for ';' in the cases where ';' has been given > a sneaky meaning, or represent some infrequent but terrifically useful > unary or binary op, etc. You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Even if we limit ourselves to Latin1 for now, there's things like the broken pipe ¦ and logical not ¬ and such that look useful. I'd avoid using standard signs like multiply à and divide ÷ for non-standard purposes though. (Not that we can exactly use multiply even for its standard purpose--there's an awfully heavy resemblance between à and x, at least in the typical sans serif font.) It would be really funny to use cent ¢, pound £, or yen Â¥ as a sigil, though... > C'mon, everybody's doing it! First one's free, kid... ;-) People who believe slippery slope arguments should never go skiing. On the other hand, even the useful slippery slopes have "beginner" slopes. I think one advantage of using Unicode for advanced features is that it *looks* scary. So in general we should try to keep the basic features in ASCII, and only use Unicode where there be dragons. It will certainly be possible to write APL in Perl, but if you do, you'll get what you deserve. In fact, the problem with APL is not that it's possible to write APL in it, but that it is impossible not to... :-) Larry
What is the order of evaluation for separate streams in a loop?
Something from [EMAIL PROTECTED] about the relative frequency made me wonder: What's the "order of evaluation" or "nestedness" for separate streams in a for loop? That is, can I meaningfully say: for my $i; $j -> 0 .. @array.length - 1; $i + 1 .. @array.length { .. } And get the equivalent of two nested loops? =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 11:27:16AM -0800, Austin Hastings wrote: > --- Matthew Zimmerman <[EMAIL PROTECTED]> wrote: > > On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote: > > > Matthew Zimmerman wrote in perl.perl6.language : > > > > > > > > So let me make my original question a little more > > > > general: are Perl 6 source files encoded in Latin-1, > > > > UTF-8, or will Perl 6 provide some sort of translation > > > > mechanism, like specifying the charset on the command > > > > line? > > > > > > I expect probably something similar to Perl 5's encoding > > > pragma. (But hopefully lexically scoped.) > > > > Okay, but what will the default be? UTF-8? iso-8859-1? My > > current locale? Am I going to have put > > > > use encoding 'utf8'; # or whatever the P6 syntax will be > > > > at the beginning of every program that might get distributed > > outside of my home country to make sure it'll run? > > 8859-1 will be the default. Actually, Unicode will be the default. 8859-1 can probably also be handled without declaration. > If you want "trigraph" support, you'll have to put > > use encoding 'ugly-american'; > > at the top of your files. ;-) ;-) ;-) > > Otherwise, it'll be one-character ?fancyops? all the way. Mmm, I view one-character Unicode operators as more of an escape hatch for the future, not as something to be made mandatory. But then, I'm one of those ugly Americans. Of course, I also think I'm allowed to be a little inconsistent in forcing things like »op« on people. After all, there's gotta be some advantage to being the Fearless Leader... Larry
Re: What is the order of evaluation for separate streams in a loop?
> Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm > Date: Mon, 4 Nov 2002 12:09:12 -0800 (PST) > From: Austin Hastings <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/ > > Something from [EMAIL PROTECTED] about the relative frequency made me > wonder: > > What's the "order of evaluation" or "nestedness" for separate streams > in a for loop? > > That is, can I meaningfully say: > > for my $i; $j -> 0 .. @array.length - 1; $i + 1 .. @array.length > { > .. > } I don't know where to correct you first... <:) I'll start by saying your variables are on the wrong side of the pointy sub. Also, presuming you switched the order, that C shouldn't be there and would be an error. Also, you don't need those C<.length>s, but I guess they don't hurt. Finally, multi-stream C iterates I. So: for 0..@array-1; $i+1..@array -> $i; $j { ... } Would.. um.. I think that's an error. Or, it is if C is on. Otherwise it would use the undefined C<$i> (unless, of course, it was defined in an enclosing lexical scope). Luke
Re: UTF-8 and Unicode FAQ, demos
--- [EMAIL PROTECTED], UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR. wrote: > Mmm, I view one-character Unicode operators as more of an escape > hatch > for the future, not as something to be made mandatory. But then, > I'm one of those ugly Americans. EBCDIC didn't support brackets, originally, so ANSI included trigraphs called ??( and ??) for [ and ], respectively. But the fact of the matter is that about epsilon (which is to say, really close to zero) people wrote trigraphs. So, yeah, include trigraph sequences if it will make happy the people on the list who can't be bothered to read the documentation for their own keyboard IO system. But don't expect the rest of us to use them. In short: 1- « and » are really useful in my context. 2- I can make my work environment generate them in one (modified) keystroke. 3- I can make my home environment do likewise. 4- The "ascii-only" version isn't faster and easier, nor more morally pure. 5- There is no "differently keyboard abled" market out there which has engaged my sympathy, ascii-operator wise. Ergo, 6- my @a = @b «+» @c; > Of course, I also think I'm allowed to be a little inconsistent in > forcing things like »op« on people. After all, there's gotta be > some advantage to being the Fearless Leader... Which kind of begs the question: Who are you? And can you authenticate that which you just implicitly claimed? (See quote header, above, if you don't understand my question) > > Larry =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: What is the order of evaluation for separate streams in a loop?
--- Luke Palmer <[EMAIL PROTECTED]> wrote: > I don't know where to correct you first... <:) Is that a dunce-hat? Is there an ISO version I could use instead? :-> > I'll start by saying your variables are on the wrong side of the > pointy sub. Also, presuming you switched the order, that C > shouldn't be there and would be an error. Also, you don't need those > C<.length>s, but I guess they don't hurt. Cost of switching languages in midstream. My bad. > Finally, multi-stream C iterates I. So: > > for 0..@array-1; $i+1..@array -> $i; $j { > ... > } > Wow. I knew that. Duh. Nevermind. =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
On 2002-11-04 at 12:26:56, Austin Hastings wrote: > 1- ? and ? are really useful in my context. Okay. Now can you get your mailer to send them properly? :)
Re: How to set your Windows keyboard to ¶erl-mode
Austin Hastings wrote: At this point, Meestaire ISO-phobic Amairecain Programmaire, you have achieved keyboard parity with the average Swiss six-year-old child. The question is not about being ISO-phobic or pro-English. ** The question is whether we want a pictographic language. I like the size of the English alphabet. It produces fairly short words, but the words are very robust (people can read words in all orientations, backwards, upside down, in crazy fonts, hand-written, etc.) This is the opposite of Huffman encoding, but just as useful IMHO. I've had the unpleasant job of turning math into software. Hand written formulae can be very difficult to read because mathematics worships Huffman encoding. Multiplication is specified by *nothing*. Exponents are just written a bit smaller and a bit raised. Is this what we want in the core? Does anyone have any references for reading and comprehension rates for different types of languages? I'm ignorant on the subject and this seems like something a Perl programmer should know. - Ken ** I'm probably both. ISO-phobic because I actually represented my company on an ISO standard committee. Pro-English because it's what I use -- being pro-English doesn't make me against everything else. A language would have to be pretty bad to have its native speakers advocate something else!
Re: UTF-8 and Unicode FAQ, demos
> After all, there's gotta be some advantage to > being the Fearless Leader... > > Larry Thousands will cry for the blood of the Perl 6 design team. As Leader, you can draw their ire. Because you are Fearless, you won't mind... -- ralph
Re: UTF-8 and Unicode FAQ, demos
Ken Fox wrote: I know I'm just another sample point in a sea of samples, but my embedded symbol parser seems optimized for alphabetic symbols. The cool non-alphabetic Unicode symbols are beautiful to look at, but they don't help me read or write faster. Once again: we're only talking about « and ». There are rare exceptions (like grouping) E.g. « and » ;-) where I strongly prefer non-alphabetics, but otherwise alphabetics help me get past the "what is this code?" phase and into the "what does this code do?" phase as quickly as possible. Interestingly, I find it just the opposite. The use of symbolic operators makes it easier for me to differentiate the "nouns", "verbs", and "punctuation" of a piece of code. Damian
Re: How to set your Windows keyboard to ¶erl-mode
--- Ken Fox <[EMAIL PROTECTED]> wrote: > Austin Hastings wrote: > > The question is not about being ISO-phobic or pro-English. ** The two gripes I've heard have been: 1- It's hard to type. 2- I don't know how to type it on platform X. With combo gripe "It'll be hard to remember how to type it across multiple platforms X, Y, Z, etc." coming in third. So I solved that problem. I know it's easy to type on Mac, I know how to MAKE it easy to type on WinPC, and I know how to MAKE it easy to type on an X terminal. In all cases, [OPTION] or [ALT] plus some matching set of punctuation [(slashes) or (brackets)]. Now it's easy to type (easier, for me at least, than typing two backticks, since the modifier level is the same and the hand-contortion on a PC type keyboard [with ` and ~ in the top left corner] is much lower), and not too difficult to remember, even across N platforms. So I'll treat your objection, below, as a new one. > The question is whether we want a pictographic language. I like > the size of the English alphabet. It produces fairly short words, > but the words are very robust (people can read words in all > orientations, backwards, upside down, in crazy fonts, hand-written, > etc.) This is the opposite of Huffman encoding, but just > as useful IMHO. The << and >> (rendered thus for Mr. Reed) are just as pictographic (or not) as [ and ]. They look the same from top or bottom, and are unmistakable in direction when looked at from either side. Likewise, they are probably MORE clear, as has been mentioned, than the difference between ' (apostrophe) and ` (tick) in many standard fonts, especially the variable-width variety sometimes invoked for 8-bit messages. But in this context, we've got a pair of balanced, unmistakable characters which have no other uses (compare, say, %hash and $a %= $b; same character '%', different usages) being proposed to serve as the marker for a new class of operation. > ... > Exponents are just written a bit smaller and a bit raised. Is this > what we want in the core? If every keyboard and operating system had the ability to simply generate arbitrary expressions of the form (expr-a) ** (expr-b), ad infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it. But they can't, so we don't. > ... > ** I'm probably both. ISO-phobic because I actually represented my > company on an ISO standard committee. You have my sympathy. =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
Garrett Goebel wrote: Can't we have our cake and eat it too? Give ASCII digraph or trigraph alternatives for the incoming tide of Perl6 Unicode? Allow both >>*<< and »*«? I'd really prefer we didn't. I'd much rather keep << and >> for other things. Or something similar '>>*'<<, [>*<], etc... Much as I hate the notion of di- and trigraphs, this is a possibility. Though I'd much rather we just allowed POD escapes (e.g. E and E) in code. And, yes, I'm aware that makes E*E incredibly ugly. I'm rather *counting* on it, in fact ;-) Damian
Re: UTF-8 and Unicode FAQ, demos
> people on the list who can't be bothered to read > the documentation for their own keyboard IO system. Most of this discussion seems to focus on keyboarding. But that's of little consequence. This will always be spotted before it does much harm and will affect just one person and their software at a time. Errors in encoding during transmission is a whole lot more problematic. This will almost always be spotted after the fact, and may affect many people at a time and require fixes to multiple systems not controlled by the sender or receiver. -- ralph
utf and ebcdic
On 04/11/02 12:12 -0800, [EMAIL PROTECTED] wrote: > > If you want "trigraph" support, you'll have to put > > > > use encoding 'ugly-american'; > > > > at the top of your files. ;-) ;-) ;-) > > > > Otherwise, it'll be one-character ?fancyops? all the way. > > Mmm, I view one-character Unicode operators as more of an escape hatch > for the future, not as something to be made mandatory. But then, > I'm one of those ugly Americans. > > Of course, I also think I'm allowed to be a little inconsistent in > forcing things like »op« on people. After all, there's gotta be > some advantage to being the Fearless Leader... On one hand I really respect your fearlessness to go where no language author has gone before. No matter what happens, I pretty sure you'll be "remembered" for it. ;) On the other hand I'm wondering what happens to the ebcdic platforms and the like. Will it even work to have core modules written in non ascii and expect them to translate to ebcdic? I suppose you'll have to convert them to trigraphs as part of the installation. Just wondering if you've thought through the support issues for platforms that by their definition won't be using utf ever. FWIW, ebcdic *does* have the cent sign! Cheers, Brian
Re: UTF-8 and Unicode FAQ, demos
--- Me <[EMAIL PROTECTED]> wrote: > > people on the list who can't be bothered to read > > the documentation for their own keyboard IO system. > > Most of this discussion seems to focus on keyboarding. > But that's of little consequence. This will always be > spotted before it does much harm and will affect just > one person and their software at a time. Good. Counting Damian, that makes three of us. Welcome aboard, ralph. :-) > Errors in encoding during transmission is a whole lot > more problematic. This will almost always be spotted > after the fact, and may affect many people at a time > and require fixes to multiple systems not controlled > by the sender or receiver. I disagree (slightly). I get emailed powerpoint files, jpeg images, and tens of other binary formats every day, and they consistently come through correctly. The transmission network is working fine. What we've got is an encoding problem at the MUA level. Mark Reed says my mailer (Yahoo!) tagged a message containing high-bit characters as US-ASCII. Several people the other day reported on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. There are problems, and this kind of change will create a demand to get them fixed. Those products that satisfy the demand will survive. The others won't. Up until now, though, everyone's been lax about making the encoding stuff strack. But this is a language widely regarded as a huge player, and when a huge player says "You need to take care of (something)", then it gets done. Perl6 will do more to address the real technical issues of electronic communication between Americans and French-speakers than anything else. (Primarily because Perl hackers want to talk to each other, but no French-speaker wants to talk to an American ;-) =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: utf and ebcdic
--- Brian Ingerson <[EMAIL PROTECTED]> wrote: > FWIW, ebcdic *does* have the cent sign! And the "not" sign. Damian may force us to abandon ASCII entirely... =Austin __ Yahoo! - We Remember 9-11: A tribute to the more than 3,000 lives lost http://dir.remember.yahoo.com/tribute
Re: UTF-8 and Unicode FAQ, demos
I'm having trouble this is even being considered. At all. And especially for these operators... > So, yeah, include trigraph sequences if it will make happy the people > on the list who can't be bothered to read the documentation for their > own keyboard IO system. > > But don't expect the rest of us to use them. So you're one of the very few people who bothered to set up unicode, and now you want to force the rest of us into your own little "leet" group. Given the choice between learning how to reconfigure their keyboard, editor, terminal, fonts, and everything else, or just not learning perl6, I bet you'd have a LOT of people who get scared away. Face it, too many people think perl is linenoise heavy and random already. Which brings me to my real question: why these operators? It's not as if they're even particularly intuitive for this context. They're quotes. They don't mean "vector" anything, and never have. I could almost see if the characters in question just screamed the function in question (sqrt, not equals, not, sum, almost anything like that), but these are just sort of random. Given how crazy this is all getting, is it absolutely certain that we're better off not just making vector operations work without modifiers? I reread the apocalypse just now, and I don't really see the problem. The main argument against seems to be "perl5 people expect it to be scalar", but perl5 people will have to get used to a lot. I think the operators should just be list based, and if you want otherwise you can specify "scalar:op" or convert both sides to scalars manually (preferably with .length, so it's absolutely clear what's meant). -- Adam Lopresto ([EMAIL PROTECTED]) http://cec.wustl.edu/~adam/ Who are you and what have you done with reality? --Jamin Gray
[ANNOUNCE] Perl6 Docs, an initial "Chapter".
There is a (partial) book-style chapter describing Perl6 values, variables, and primitive/promoted types at: http://cog.cognitivity.com/perl6/val.html The entire thing is one page, for easy printing. It works out to about 15-20 pages, depending on your printer. There is *much* more coming soon. This is the "expanded" version of what I posted earlier: it is much more accurate and detailed. It represents the beginnings of a "Chapter 1" of detailed (tho unofficial) Perl6 documentation. My working approach is to document Perl6 as if Perl5 never existed: in other words, don't focus on what's "new" or "changed": just document all aspects of Perl6 completely, from the ground up, written for people who may not have worked with Perl5 at all. There are some assumptions in the more detailed descriptions: specifically, I have documented explicit radix, bool, and a few other things as if they will exist, though they haven't been decided yet. That's why I'm writing it -- to get at all the nooks and crannies, and make sure we're not including, missing, or assuming anything we shouldn't. There is much more coming (another 100 pages or so to finish up, edit & post for this chapter and the next): I'll post the rest as I get it polished, but I need some initial feedback on this first part, especially: -- Is the writing style acceptable? Should it be less formal? More formal? Geared to dumber people? Smarter people? Just right? -- Are things being presented in the correct order? Are there any obvious gaps where I need more explanation? -- Syntax and accuracy goofs, obviously. I realize the first page or two are pretty dull, because they're more "abstract", and stuff so basic that it's hard to explain properly at all. But it assuredly needs some edits, so I will be indebted to anyone who slogs through it. :-) The farther you go, the more interesting/fun/scary it gets. I can't wait to show the beginnings of an "Operators" chapter. :-) MikeL
Re: UTF-8 and Unicode FAQ, demos
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote: > In short: > > 1- ? and ? are really useful in my context. > 2- I can make my work environment generate them in one (modified) > keystroke. > 3- I can make my home environment do likewise. > 4- The "ascii-only" version isn't faster and easier, nor more morally > pure. > 5- There is no "differently keyboard abled" market out there which has > engaged my sympathy, ascii-operator wise. > > Ergo, > > 6- my @a = @b ?+? @c; It's a great argument. I know how to type "funny" characters too. I can even read some of the ones some people send. Just don't expect me to be able to understand any Perl 6 you mail me. Whether the problem is at your end, my end or somewhere in the middle is moot. On the other hand, maybe all these issues will be sorted out before we can start writing Perl 6 in earnest. In one way I hope that is true. In another I hope it isn't ;-) -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net
Re: Unifying invocant and topic naming syntax
On Sun, Nov 03, 2002 at 11:17:32PM -0600, Me wrote: > > I started with a simple thought: > > is given($foo) > > seems to jar with > > given $foo { ... } > > One pulls in the topic from outside and > calls it $foo, the other does the reverse -- > it pulls in $foo from the outside and makes > it the topic. It comes from is using 'given' as a noun (meaning the same thing as 'topic') instead of a verb. So, this: given $foo { ... } means "make $foo become the given" or "Given-ify $foo". While: is given($foo) means "$foo takes the value of the given". There may be room for a better parameter name. We considered quite a few before picking this one, though, and I'm pretty happy with it for now. > On its own this was no big deal, but it got > me thinking. > > The key thing I realized was that (naming) > the invocant of a method involves something > very like (naming) the topic of a method, > and ultimately a sub and other constructs. The similarity is that both are implicit parameters, i.e. they're accessible to the sub/method but aren't explicitly passed when it's called. They're not quite the same though, as the C parameter is entirely out-of-band (in P5 terms, it wouldn't appear in @_). Also, both may be the topic under certain circumstances. But then, any variable can be the topic. Generally, there's no conceptual link between the invocant of a method and the topic in the caller's scope. > Thus it seems that whatever syntax you pick > for the former is likely to work well for > the latter. > > Afaik, the syntax for invocant naming is: > > method f ($self : $a, $b) { ... } > > But whatever it is, I think one can build > on it for topic transfer / naming too in a > wide range of contexts. > > With apologies for talking about Larry's > colon, something that really does sound > like it is taboo for good reason, I'll > assume the above invocant naming syntax > for the rest of this email. > > So, perhaps: > > sub f ($a, $b) is given($c) { ... } > sub f ($a, $b) is given($c is topic) { ... } > sub f ($a, $b) is given($_) { ... } > > could be something like: > > sub f ($c : $a is topic, $b) { ... } > sub f ($c : $a, $b) { ... } > sub f ($_ : $a, $b) { ... } > > where the first arg to be mentioned is the > topic unless otherwise specified. > > (The first line of the alternates is not > semantically the same as the line it is a > suggested replacement for, in that the > current scheme would not set the topic -- > its value would be the value of $_ in > the lexical block surrounding the sub > definition. It's not obvious to me why > the current scheme has it that way and > what would best be done about it in the > new scheme I suggest, so I'll just move on.) Even though both features have something to do with topic, they're really independent. There are times when it's useful to access the caller's topic without setting the current topic and times when it's useful to just set the current topic. sub f ($a, $b) { ... } # use neither feature sub f ($a is topic, $b) { ... } # topic setting sub f ($a, $b) is given($c) { ... } # access caller's topic sub f ($a, $b) is given($c is topic) { ... } # combine When you have a system with two independent but interacting features, it's far more efficient to define two independent flags than to define 4 flags to represent the 4 combinations. It's also easier to learn. > The obvious (to me) thing to do for methods > is to have /two/ colon separated prefixes of > the arg list. So one ends up with either one, > two, or three sections of the arg list: > > # $_ is invocant: > method f ($a, $b) { ... } > > # $_ and $self are both invocant: > method f ($self : $a, $b) { ... } > > # $_/$self are invocant, $c caller's topic > method f ($self : $c : $a, $b) { ... } Any two constructs with the same syntax but an entirely different meaning exponentially increase the chance of confusion. Confusion increases the likelyhood of bugs. Not to mention frustrating the programmer. > One question is what happens if one writes: > > method f (: $c : $a, $b) { ... } > > Is the invocant the topic, or $c, ie what > does a missing invocant field signify? The invocant would be the topic still. It is now with: method f (: $a, $b) { ... } > Jumping to a different topic for one moment, > I think it would be nice to provide some > punctuation instead of (or as an alternate > to) a property for setting 'is topic'. Maybe: > > method f ($self : $c : $a*, $b) { ... } > > or maybe something like: > > method f ($self : $c : $aT, $b) { ... } > > (Unicode TM for Topic Marker? Apologies if I > screwed up and the TM character comes through > as something else.) Huffman encoding comes to mind. Is this really common enough to merit a single punctuation character? > Anyhow, a further plausible tweak that builds > on the above colon stuff is that one could > plausibly do: > > sub f ($bar
Re: How to set your Windows keyboard to ¶erl-mode
[EMAIL PROTECTED] (Ken Fox) writes: > The question is whether we want a pictographic language. So far we've managed to avoid turning Perl into APL. :-) -- Larry Wall in <[EMAIL PROTECTED]> Although that was some time ago... :) -- The FSF is not overly concerned about security. - FSF
Re: UTF-8 and Unicode FAQ, demos
Austin Hastings wrote in perl.perl6.language : > > What we've got is an encoding problem at the MUA level. Mark Reed says > my mailer (Yahoo!) tagged a message containing high-bit characters as > US-ASCII. Several people the other day reported on the differences in > UTF8 vs. Latin-1 handling among pine, elm, and other mailers. Not only the MUA level. Usually source code is written in a lowest common denominator of ascii, even for languages that allow unicode identifiers (Java) or markup. That's because source code is handled by parsers, documentation extractors, pretty printers, diff(1), patch(1), version control software, and (you said it) various internet clients. That's why some people may still prefer to continue using pure ascii even though then think that unicode operators are cool. (Esp. if they are under the influence of FUD : "use PHP ! it's ascii compliant !") > Perl6 will do more to address the real technical issues of electronic > communication between Americans and French-speakers than anything else. > (Primarily because Perl hackers want to talk to each other, but no > French-speaker wants to talk to an American ;-) You're Italian, aren't you ?
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Damian Conway) writes: > > Or something similar '>>*'<<, [>*<], etc... > > Much as I hate the notion of di- and trigraphs, this is a possibility. I do like this too, because it reminds me of C trigraphs, which had precisely the same purpose - allow people with old-fashioned sub-standard character sets to come and play with the big boys. And eventually, the old trigraphs died out because everyone caught up with the decent (for the era) character sets. That's assuming we have to have Unicode operators. I would, however, like to hear a passionate argument in favour of this, because we've seen plenty of arguments against (encoding, transmission, keyboarding, etc.) but not all that many in favour, so a nice definitive one would be helpful. -- I often think I'd get better throughput yelling at the modem.
Re: UTF-8 and Unicode FAQ, demos
--- Rafael Garcia-Suarez <[EMAIL PROTECTED]> wrote: > Austin Hastings wrote in perl.perl6.language : > > > > What we've got is an encoding problem at the MUA level. Mark Reed > says > > my mailer (Yahoo!) tagged a message containing high-bit characters > as > > US-ASCII. Several people the other day reported on the differences > in > > UTF8 vs. Latin-1 handling among pine, elm, and other mailers. > > Not only the MUA level. Usually source code is written in a lowest > common denominator of ascii, even for languages that allow unicode > identifiers (Java) or markup. That's because source code is handled > by > parsers, documentation extractors, pretty printers, diff(1), > patch(1), > version control software, and (you said it) various internet clients. > That's why some people may still prefer to continue using pure ascii > even though then think that unicode operators are cool. (Esp. if they > are under the influence of FUD : "use PHP ! it's ascii compliant !") Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so we've got the kings of FUD on our side for a change. Joy. > > Perl6 will do more to address the real technical issues of > electronic > > communication between Americans and French-speakers than anything > else. > > (Primarily because Perl hackers want to talk to each other, but no > > French-speaker wants to talk to an American ;-) > > You're Italian, aren't you ? Actually, an American who's been ignored in many places. :-) =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Austin Hastings) writes: > Yeah, but ActiveState does Perl, and Microsoft owns ActiveState To what extent are *either* of those statements true? :) -- All the good ones are taken.
Re: How to set your Windows keyboard to ¶erl-mode
Austin Hastings wrote: The << and >> ... are just as pictographic (or not) as [ and ]. I'm not particularly fond of << or >> either. ;) Damian just wrote that he prefers non-alphabetic operators to help differentiate nouns and verbs. I find it helpful when people explain their biases like that. What's yours? They look the same from top or bottom, and are unmistakable in direction when looked at from either side. Well, anything can look like itself, that wasn't the point. The goal is to not look like anything else in any orientation. The chars O and 0 fail badly, but A and T are excellent. I'm not sure where << and >> fall because I don't have any experience with them. Programming languages probably get away with more because most programmers don't spray paint algorithms on the side of a bridge. (Well, Lisp programmers maybe. ;) My three points against arbitrary punctuation as symbols are (1) it's impossible to identify symbol boundaries when reading punctuation -- you just have to guess, (2) it's harder to work with punctuation in non-digital communication, and (3) my memory doesn't work well on punctuation symbols! Perl has some nice features like sigils that clue people in on how to read a sentence. But... difference between ' (apostrophe) and ` (tick) is a horrible abomination. ;) If every keyboard and operating system had the ability to simply generate arbitrary expressions of the form (expr-a) ** (expr-b), ad infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it. But they can't, so we don't. Non sequitur. Written language prior to the printing press had no technological reason to limit alphabet size. Some languages developed very large pictographic representations, others developed small alphabets with word formation rules. I have no idea what the design pressures were that caused these different solutions. Do you? What are the strengths and weaknesses of the approaches? Why should we select one over the other? - Ken
Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Or more than one type of supercomma, e.g: for @x ¡ò @y ¡ò @z -> $x ¡ò $y ¡ò $z { ... } to mean: for @x ; @y ; @z -> $x ; $y ; $z { ... } - vs - for @x ¡× @y ¡× @z -> $x ¡× $y ¡× $z { ... } to mean: for @x -> $x { for @y -> $y { for @z -> $z { ... } } } ;-) MikeL
Re: UTF-8 and Unicode FAQ, demos
--- Simon Cozens <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] (Austin Hastings) writes: > > Yeah, but ActiveState does Perl, and Microsoft owns ActiveState > > To what extent are *either* of those statements true? :) Hmm. Well, last time I checked you could still download a perl binary from ActiveState.com. And, in fact, check out the motivation behind the "agreement": http://www.dnjonline.com/newsreel/index.html Microsoft buys into Perl Aug 6 - Microsoft has hired ActiveState Tool Corporation to improve the Windows functionality of the Open Source scripting language Perl. This agreement reinforces a long-term relationship between Microsoft and Perl, stemming from 1993 when Microsoft funded the first port of Perl 5 to the Windows platform. ActiveState develops and distributes the popular Windows version, called ActivePerl. "Our mission is to make Perl as popular as possible," said Dick Hardt, chief executive of ActiveState. The monetary details of the deal weren't revealed - in fact there was no mention of it anywhere on the Microsoft web site. Instead the impetous seems to have come from Microsoft India where the main aim is to improve Perl's support for non-Roman character-sets through Unicode. As part of the agreement, ActiveState will add features previously missing from Windows ports of Perl, as well as full support for Unicode - a key feature to users dealing with Asian character sets. blah blah blah ... =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
--- "Adam D. Lopresto" <[EMAIL PROTECTED]> wrote: > I'm having trouble this is even being considered. At all. And > especially for these operators. Heute vektoren, morgen das welt! Uniperl, Uniperl uber alles, Uber alles in der welt! With hyper-states through choose and true(); Masterfully golf scorin' script, Von der << bis an die all(), Von der any() bis an den >> - Uniperl, Uniperl uber alles, Uber alles in der welt! > So you're one of the very few people who bothered to set up unicode, > and now you want to force the rest of us into your own little > "leet" group. Nerp. Hadn't given it a second thought until the whining started about "It's so hard..." I had actually figured that I'd be able to set a keystroke in my editor and that would be the end of it. But then, for no good reason that I can think of, I tried microsoft's help site and found it in about thirty seconds. No need to set up a keyboard macro -- it's part of the OS. I did BBS, though not as a "warez d00d". It's "L33t". > Given the choice between learning how to reconfigure their keyboard, > editor, terminal, fonts, and everything else, or just not learning > perl6, I bet you'd have a LOT of people who get scared away. That sounds a lot like what I said (and to a certain extent still fear) back when -> was first going away. It didn't work then, either. > Face it, too many people think perl is linenoise heavy and random > already. Which is why adding a single character with a single meaning that can be covered in chapter 14 instead of chapter 3 is a workable idea, and why creating an operator called "Jesus, it looks like an ASCII-art version of a dancing penguin in high heels" isn't. "Bow-tie operator", indeed! If @a [>*=<] @b; doesn't scan like rats chewing their way into your cable, what does? > Which brings me to my real question: why these operators? It's not > as if they're even particularly intuitive for this context. They're > quotes. They don't mean "vector" anything, and never have. I could > almost see if the characters in question just screamed the function > in question (sqrt, not equals, not, sum, almost anything like that), > but these are just sort of random. Simple answer: Larry suggested them. And was willing to sacrifice qw functionality to this. Also, I suppose, because of the map() suggestion a while back -- this "operation" is going to wind up taking a huge range of parameters in some not-too-distant future. And @a = @b <> @c; will read a lot better, when <> is 8 lines long. > Given how crazy this is all getting, is it absolutely certain that > we're better > off not just making vector operations work without modifiers? I > reread the > apocalypse just now, and I don't really see the problem. The main > argument > against seems to be "perl5 people expect it to be scalar", but perl5 > people > will have to get used to a lot. I think the operators should just be > list > based, and if you want otherwise you can specify "scalar:op" or > convert both > sides to scalars manually (preferably with .length, so it's > absolutely clear > what's meant). It's not absolutely certain. But this discussion was destined to happen, since we're just about out of line noise, but we're nowhere close to being out of clever ideas. =Austin __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
On 04/11/02 14:09 -0800, Austin Hastings wrote: > > --- Rafael Garcia-Suarez <[EMAIL PROTECTED]> wrote: > > Austin Hastings wrote in perl.perl6.language : > > > > > > What we've got is an encoding problem at the MUA level. Mark Reed > > says > > > my mailer (Yahoo!) tagged a message containing high-bit characters > > as > > > US-ASCII. Several people the other day reported on the differences > > in > > > UTF8 vs. Latin-1 handling among pine, elm, and other mailers. > > > > Not only the MUA level. Usually source code is written in a lowest > > common denominator of ascii, even for languages that allow unicode > > identifiers (Java) or markup. That's because source code is handled > > by > > parsers, documentation extractors, pretty printers, diff(1), > > patch(1), > > version control software, and (you said it) various internet clients. > > > That's why some people may still prefer to continue using pure ascii > > even though then think that unicode operators are cool. (Esp. if they > > are under the influence of FUD : "use PHP ! it's ascii compliant !") > > Yeah, but ActiveState does Perl, and Microsoft owns ActiveState, so > we've got the kings of FUD on our side for a change. Joy. Speaking of FUD, that's simply not true, nor tasteful IMO. AS has done a handful of short-term contacts for MS, and that's the extent of their relationship. FWIW, AS also does as much or more Unix development as Windows development. They also employ some people who have individually advanced Perl more than you'll ever know. > > > Perl6 will do more to address the real technical issues of > > electronic > > > communication between Americans and French-speakers than anything > > else. > > > (Primarily because Perl hackers want to talk to each other, but no > > > French-speaker wants to talk to an American ;-) > > > > You're Italian, aren't you ? > > Actually, an American who's been ignored in many places. :-) > > =Austin > > > __ > Do you Yahoo!? > HotJobs - Search new jobs daily now > http://hotjobs.yahoo.com/
Re: UTF-8 and Unicode FAQ, demos
[EMAIL PROTECTED] (Austin Hastings) writes: > If @a [>*=<] @b; doesn't scan like rats chewing their way into your > cable, what does? This is why God gave us functions as well as operators. -- I _am_ pragmatic. That which works, works, and theory can go screw itself. - Linus Torvalds
Possible Vector Operator Notations
The many recent suggestions for denoting vector operators all seem to have problems, with some having significant impact elsewhere in the language. After reading a few hundred mails on the subject I'm no longer sure what I prefer, but thought I'd be in a better position to have an opinion if I at least knew what the options were. In case anybody else feels similarly, I've listed the main contenders below together with salient points pertaining to them gleaned from this list. (Credit to those who originally made them, and apologies for omitting attributions from this compendium.) The suggested syntax is demonstrated with the hypothetical C operator (presumably making it a hyper-hypo op): * the "as we were" option: ^op » Tilde is used for xor, underscore remains string concatenation. » Consistent with previously published Perl 6 code samples. » No character left for eating whitespace. » No way of distinguishing precedence of vector assignment op. » Slightly embarrasing to have so much discussion, go round in circles a couple of times, and end up right where we started. * the "xor is a double-sided not" option: ^op » Exclamation mark is used for xor (in its various forms), leaving tilde for concatenation and underscore for eating whitespace. » Vector ops remain consistent with previous code samples. » No way of distinguishing precedence of vector assignment op. * the "looks like an array" option: [op] » Seemed a nice idea, but doesn't work with other use of square brackets. * the "guillemot" option » Inconvenient for those who don't live near the sea, and messy even for those who do. * the "guillemet" option: «op» » Looks nice. » Awkward to type for some people. » May not be transmitted correctly in mailing lists and similar. » May not be in the character set used by some people in the world. » Has distinct characters for vector ops that have no other meaning in Perl. » Uses 'special' looking symbols for 'special' ops. » Looks really nice. » Looks really likely to cause confusion in mailing lists. * the "mélange" option: ^[op] » Xor continues to use caret. » Overcomes problems with using just caret or square brackets. » Easy enough to type. » Looks ugly and/or unbalanced. » Potential for confusion with overloading symbols used individually elsewhere in Perl. * the "either of the above" option: «op» and ^[op] » Can choose the elegance of the guillemet or the type-ability and encoding-neutral convenience of the mélange. » The non-Ascii characters still cause hassle when they are transmitted. » The two variants don't look anything like each other. » Everybody will have to learn both versions to be able to cope with others' code. * the "generic extensibility" option: «op» and `<> » Backticks and two-character mnemonics from RFC1345 are used to provide an Ascii-only alternative for some symbols. » The Perl 5 use of backticks has to be provided some other way. » It's neat to have a consistent way of typing special symbols. » Possibly overkill if Perl doesn't introduce any standard non-Ascii operators other than vector. » Having more power in cryptic symbols rather than words may lead to increased claims of Perl being unreadable. » Still suffers from needing to be able to view and transmit non-Ascii characters. * the "temelliug" option: »op« » Guillemets the conventional way round are used for qw(). » The same, unusual, symbols are used for two very different purposes. » Still the non-Ascii problems. * the "how many characters?" option: <> » Bitwise shift operators are preceded with a plus (like other bitwise ops are). » Here-docs require the tag to be quoted. » Vector operators take up five or six characters. » Probably faster for many people to type than guillemets are. (The second of each doubled character is very fast to type.) » Vector bitwise shifts may look a little odd. * the "single chevrons" option: » The Perl 5 'read a chunk' use of angled brackets needs to be provided some other way. » Not much to type. » Potential (human, if not parser) confusion from same characters being used elsewhere in Perl for comparisons and bitwise shifts. * the "suggested but not much discussed" options: ~op ~~op `op `op` *[op] .[op] =[op] ![op] _[op] :[op] '[op] ~[op] (>op<) <)op(> >)op(< [>op<] [)op(] Phew! I'm slightly concerned at this list making Piers's job too easy, but have tried to minimize that effect by posting on a Monday (meaning that this mail is ineligible for inclusion in the next summary and is likely to be out of date by the time of the following one). Smylers
Re: Possible Vector Operator Notations
[EMAIL PROTECTED] (Smylers) writes: Thank you very, very much for this; this is supremely helpful. > » No character left for eating whitespace. That's a feature, not a bug! The space-eater alternately worries, confuses and scares me. -- I want you to know that I create nice things like this because it pleases the Author of my story. If this bothers you, then your notion of Authorship needs some revision. But you can use perl anyway. :-) - Larry Wall
Re: Unifying invocant and topic naming syntax
> > (naming) the invocant of a method involves > > something very like (naming) the topic > > Generally, there's no conceptual link... other than > The similarity is that both are implicit > parameters which was my point. Almost the entirety of what I see as relevant in the context of deciding syntax for naming the invocant is that it's an especially important implicit argument to methods that may also be the topic. Almost the exact same thing is true of the "topic", with the exception being that it applies to subs as well as methods. It's clear you could have come up with something like one of these: method f ($a, $b) is invoked_by($self) method f ($a, $b) is invoked_by($self is topic) method f ($a, $b) is invoked_by($_) but you didn't. Any idea why not? > There are times when it's useful to access > the caller's topic without setting the current > topic and times when it's useful to just set > the current topic. > ... > When you have a system with two independent > but interacting features, it's far more > efficient to define two independent flags > than to define 4 flags Of course. The general scheme I suggested doesn't impinge on this one way or the other. But the specifics I suggested made a different choice for the default situation. As you said of the current scheme: The following example will either print nothing, or else print a stray $_ that is in lexical scope wherever the sub is defined. sub eddy ($space, $time) { print; } In the scheme I suggest the first arg is the topic by default (just as is the case for the other topicalizers such as pointy sub, for, etc in the current scheme). I think this choice makes sense, but then, as I implied, maybe I'm missing something. > > method f ($self : $c : $a*, $b) { ... } (where * is short for 'is topic'.) > Is this really common enough to merit a single > punctuation character? If I didn't think so I wouldn't have suggested the shortcut. ;> > > Anyhow, a further plausible tweak that builds > > on the above colon stuff is that one could > > plausibly do: > > So, in this system, the colon is used for both > explicit parameters and implicit parameters. > ... > I'd prefer more consistency. If by "system" you mean the scheme I suggested prior to this "plausible tweak", then no. The colon would only be used in this way if one introduced the tweak. Rejecting this tweak has no impact on the value of the general scheme I suggested. The colon in my scheme (not the tweak) can optionally be used to be explicit in sub *defs* about otherwise implict args. The tweak I am suggesting is that one could, optionally, use the exact same syntax to be explicit in sub *calls* about otherwise implicit args. I think this has a clean consistency. > > given $c : $foo { > > # $c = outer $_ > > } > > It would be much more transparent to simply > name the outer topic. How is this so different to method f ($c : $foo) { # $c = invocant } ? > > Allison -- ralph
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
[Note to all: yes, this is me, despite the weirdities of the quoting and headers. This is how it looks when I using mutt out of the box, because I haven't yet customized it like I have pine. But I do like being able to see my own Unicode characters, not to mention everyone else's. If you don't believe this is me, well, I'll just tell you that I live on a tropical island near Antarctica, my social security number is 987-65-4321, and my mother's maiden name was the same as my maternal grandfather's maiden name. Or something like that... --Ed] On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote: > On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: > >You know, separate streams in a for loop are not going to be that > >common in practic, so maybe we should look around a little harder for > >a supercomma that isn't a semicolon. Now *that* would be a big step > >in reducing ambiguity... > > Or more than one type of supercomma, e.g: > >for @x ⫠@y ⫠@z -> $x ⫠$y ⫠$z { ... } > > to mean: >for @x ; @y ; @z -> $x ; $y ; $z { ... } That almost works visually. > - vs - > >for @x § @y § @z -> $x § $y § $z { ... } > > to mean: > >for @x -> $x { > for @y -> $y { >for @z -> $z { > ... >} > } >} > > ;-) Glad you put the smiley. I think the latter is much clearer. But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(@x, @y, @z) -> $x, $y, $z { ... } the signature specifies that we are expecting 3 scalars to the sub, and conveys no information as to whether they are generated in parallel or serially. That's entirely specified on the left. The natural processing of lists says that serial is specified like this: for @a, @b, @c -> $x, $y, $z { ... } Of course, parallel() is a rotten thing to have to say unless you're into readability. So we could still have some kind of parallizing supercomma, mabye even ⥠(U+2225 PARALLEL TO). But let's keep it out of the signature, I think. In other words, if something like for @x ⥠@y ⥠@z -> $x, $y, $z { ... } is to work, then @result = @x ⥠@y ⥠@z; has to interleave @x, @y, and @z. It's not special to the C. In the case of C, of course, the compiler should feel free to optimize out the actual construction of an interleaved array. I suppose it could be argued that ⥠is really spelled »,« or some such. However, @result = @x »,« @y »,« @z; just doesn't read quite as well for some reason. A slightly better case could be made for @result = @x `|| @y `|| @z; The reason we originally munged with the signature was so that we could do weird things with differing numbers of streams on the left and the right. But if you really want a way to take 3 from @x, then 3 from @y, then 3 from @z, there should be something equivalent to: for round_robin_by_3s(@x, @y, @z) -> $x, $y, $z { ... } Fooling around with signature syntax for that rare case is not worth it. This way, the C won't have to know anything about the signature other than that it expects 3 scalar arguments. And Simon will be happ(y|ier) that we've removed an exception. Ed, er, Larry
Re: Unifying invocant and topic naming syntax
ralph wrote: It's clear you could have come up with something like one of these: method f ($a, $b) is invoked_by($self) method f ($a, $b) is invoked_by($self is topic) method f ($a, $b) is invoked_by($_) but you didn't. Any idea why not? Because most methods need some kind of access to their own invocant, but relatively few subroutines need access to the upscope topic. So the syntax for invocant specification should be compact, and the syntax for external topic access should be less so. Moreover, because the latter mechanism compromises the lexical scoping of the upscope topic, it ought to be marked very clearly (i.e. with a keyword, rather than a single colon). method f ($self : $c : $a*, $b) { ... } (where * is short for 'is topic'.) Too short, IMHO. For such a non-standard method definition, I'd *very* much rather maintain a syntax like: method f ($self : $a is topic, $b) is given($c) { ... } which clearly marks all the irregularities. Not to mention that this more explicit syntax doesn't inject a spurious pseudoparameter into the parameter list. Damian
RE: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Larry Wall: (B# for @x $B!B(B @y $B!B(B @z -> $x, $y, $z { ... } (B (BEven if you decide to use UTF-8 operators (which I am Officially (BRecommending Against), *please* don't use this one. This shows up as a (Bbox in the Outlook UTF-8 font. (B (B--Brent Dax <[EMAIL PROTECTED]> (B@roles=map {"Parrot $_"} qw(embedding regexen Configure) (B (BWire telegraph is a kind of a very, very long cat. You pull his tail in (BNew York and his head is meowing in Los Angeles. And radio operates (Bexactly the same way. The only difference is that there is no cat. (B--Albert Einstein (explaining radio)
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
On 04/11/02 17:52 -0800, [EMAIL PROTECTED] wrote: > [Note to all: yes, this is me, despite the weirdities of the quoting > and headers. This is how it looks when I using mutt out of the box, > because I haven't yet customized it like I have pine. But I do like > being able to see my own Unicode characters, not to mention everyone > else's. If you don't believe this is me, well, I'll just tell you that > I live on a tropical island near Antarctica, my social security number > is 987-65-4321, and my mother's maiden name was the same as my maternal > grandfather's maiden name. Or something like that... --Ed] Mutt? I'm using mutt and I still haven't had the privledge of correctly viewing one of these unicode characters yet. I'm gonna be really mad if you say you're also using an OS X terminal. I suspect that it's my horrific OS X termcap that's misbehaving here. Aargh! Brian > > On Mon, Nov 04, 2002 at 02:25:08PM -0800, Michael Lazzaro wrote: > > On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: > > >You know, separate streams in a for loop are not going to be that > > >common in practic, so maybe we should look around a little harder for > > >a supercomma that isn't a semicolon. Now *that* would be a big step > > >in reducing ambiguity... > > > > Or more than one type of supercomma, e.g: > > > >for @x I @y I @z -> $x I $y I $z { ... } > > > > to mean: > >for @x ; @y ; @z -> $x ; $y ; $z { ... } > > That almost works visually. > > > - vs - > > > >for @x § @y § @z -> $x § $y § $z { ... } > > > > to mean: > > > >for @x -> $x { > > for @y -> $y { > >for @z -> $z { > > ... > >} > > } > >} > > > > ;-) > > Glad you put the smiley. I think the latter is much clearer. > > But at the moment I'm thinking there's something wrong about any > approach that requires a special character on the signature side. > I'm starting to think that all the convolving should be specified > on the left. So in this: > > for parallel(@x, @y, @z) -> $x, $y, $z { ... } > > the signature specifies that we are expecting 3 scalars to the sub, > and conveys no information as to whether they are generated in parallel > or serially. That's entirely specified on the left. The natural > processing of lists says that serial is specified like this: > > for @a, @b, @c -> $x, $y, $z { ... } > > Of course, parallel() is a rotten thing to have to say unless you're > into readability. So we could still have some kind of parallizing > supercomma, mabye even P (U+2225 PARALLEL TO). But let's keep it > out of the signature, I think. In other words, if something like > > for @x P @y P @z -> $x, $y, $z { ... } > > is to work, then > > @result = @x P @y P @z; > > has to interleave @x, @y, and @z. It's not special to the C. > In the case of C, of course, the compiler should feel free to > optimize out the actual construction of an interleaved array. > > I suppose it could be argued that P is really spelled »,« or some such. > However, > > @result = @x »,« @y »,« @z; > > just doesn't read quite as well for some reason. A slightly better > case could be made for > > @result = @x `|| @y `|| @z; > > The reason we originally munged with the signature was so that we > could do weird things with differing numbers of streams on the left > and the right. But if you really want a way to take 3 from @x, then > 3 from @y, then 3 from @z, there should be something equivalent to: > > for round_robin_by_3s(@x, @y, @z) -> $x, $y, $z { ... } > > Fooling around with signature syntax for that rare case is not worth it. > This way, the C won't have to know anything about the signature other > than that it expects 3 scalar arguments. And Simon will be happ(y|ier) > that we've removed an exception. > > Ed, er, Larry
Re: UTF-8 and Unicode FAQ, demos
Larry wrote: I've actually got my eye on ≈ (U+2248 ALMOST EQUAL TO) as a replacement for ~~ someday in the distant future. I suppose it could be argued that we should use ≅ (U+2245 APPROXIMATELY EQUAL TO) instead. That's what =~ was supposed to represent, after all... Yeah, either of those work. But neither is entirely satisfactory, since there's nothing "almost" or "approximate" about the matching the operator does. We obviously need a unicode "IS LIKE UNTO" codepoint. ;-) You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Amen. Even if we limit ourselves to Latin1 for now, Which I suspect we should seriously consider. Maybe leave 9+ bit operators to Perl 7. ;-) I'd avoid using standard signs like multiply × and divide ÷ for non-standard purposes though. (Not that we can exactly use multiply even for its standard purpose--there's an awfully heavy resemblance between × and x, at least in the typical sans serif font.) That's why I semi-seriously suggested replacing C by C<×>. For some reason alphabetic operators (at least, those that are pretending to be symbols) really bug me. It would be really funny to use cent ¢, pound £, or yen ¥ as a sigil, though... H. Given that a pound is worth more than a dollar, maybe £ is the sigil for pairs. ;-) Damian
Re: Possible Vector Operator Notations
Smylers summarized (beautifully, thank-you): * the "looks like an array" option: [op] » Seemed a nice idea, but doesn't work with other use of square brackets. Could be made to work. Suppose that every operator definition (explicit or implicit) automagically also defined a variant with square brackets around it. No ambiguity for any defined operator. Of course you lose the ability to apply an arbitrary alphabetic function across a vector of arguments, but maybe that's not such a terrible price. Especially if we allow rvalue C multi-stream loops, which give the same functionality. Damian
Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)
Larry wrote: But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(@x, @y, @z) -> $x, $y, $z { ... } the signature specifies that we are expecting 3 scalars to the sub, and conveys no information as to whether they are generated in parallel or serially. That's entirely specified on the left. The natural processing of lists says that serial is specified like this: for @a, @b, @c -> $x, $y, $z { ... } Of course, parallel() is a rotten thing to have to say unless you're into readability. So we could still have some kind of parallizing supercomma, mabye even ⥠(U+2225 PARALLEL TO). I'd rather we not use that. I found it surprisingly hard to distinguishâ¥from ||. May I suggest that this might be the opportunity to deploy ¦ (i.e. E). But let's keep it out of the signature, I think. In other words, if something like for @x ⥠@y ⥠@z -> $x, $y, $z { ... } is to work, then @result = @x ⥠@y ⥠@z; has to interleave @x, @y, and @z. It's not special to the C. Very nice. The n-ary "zip" operator. I suppose it could be argued that ⥠is really spelled »,« or some such. However, @result = @x »,« @y »,« @z; just doesn't read quite as well for some reason. Agreed. A slightly better case could be made for @result = @x `|| @y `|| @z; Except by those who suffer FIABCB (font-induced apostrophe/backtick character blindness). The reason we originally munged with the signature was so that we could do weird things with differing numbers of streams on the left and the right. But if you really want a way to take 3 from @x, then 3 from @y, then 3 from @z, there should be something equivalent to: for round_robin_by_3s(@x, @y, @z) -> $x, $y, $z { ... } Or perhaps just: sub take(int $n, *@from) { yield splice @from, 0, $n while @from > $n; return ( @from, undef xx ($n-@from) ) } &three = &take.assuming(n=>3); for three(@x), three(@y), three($z) -> $x, $y, $z { ... } ??? Fooling around with signature syntax for that rare case is not worth it. This way, the C won't have to know anything about the signature other than that it expects 3 scalar arguments. And Simon will be happ(y|ier) that we've removed an exception. and reinstituted the previous exception that a semicolon in an parameter list marks the start of optional parameters! :-) Damian