> 147-+
>
> rurban, can this =item be deleted?
>
> $ grep -in -A2 -B2 aio config/init/hints/dec_osf.pm 28-
> $libs .= ' -lpthread';
> 29-}
> 30:if ( $libs !~ /-laio/ ) {
> 31:$libs .= ' -laio';
> 32-}
> 33-$conf->data->set( libs => $libs );
>
> Jarkko, are
chromatic via RT wrote:
> On Wednesday 03 December 2008 18:00:32 Jarkko Hietaniemi wrote:
>
>> First we get a couple of warnings fro some files, but then one file
>> refuses to compile (see below). I didn't notice any other warnings or
>> failures during Configure
chromatic via RT wrote:
> On Tuesday 23 December 2008 14:53:15 Jarkko Hietaniemi wrote:
>
>> I am seeing some new warnings, if I find the time I'll file a new bug on
>> those. An easy quick one to fix would be this:
>>
>> cc: Info: ./include/parrot/sub.
On Tue, May 29, 2001 at 04:36:51PM -0600, Nathan Torkington wrote:
> Dan Sugalski writes:
> > Okay--Parrot Object Code. (If I was feeling cleverer at the moment, I'd
> > come up with a name that has a snappier acronym. Alas I'm not... :)
>
> p-code. The p stands for Parrot :-)
No, it stands fo
On Wed, May 30, 2001 at 03:51:11PM +0200, H.Merijn Brand wrote:
> On Tue 29 May 2001 19:25, Dave Mitchell <[EMAIL PROTECTED]> wrote:
> > =head2 Portability
> >
> > Related to extensibility is portability. Perl runs on many, many
> > platforms, and will no doubt be ported to ever more bizarre and
On Wed, May 30, 2001 at 06:27:39PM -0500, Jarkko Hietaniemi wrote:
> On Wed, May 30, 2001 at 03:51:11PM +0200, H.Merijn Brand wrote:
> > On Tue 29 May 2001 19:25, Dave Mitchell <[EMAIL PROTECTED]> wrote:
> > > =head2 Portability
> > >
> > > Related to
> The fact that Perl 5's regex engine is a royal pain to deal with should
> be a warning to us.
>
> Much of the pain of dealing with the regex engine in Perl 5 has to do
> with allocation of opcodes and temporary values in a non-standard
> fashion, and dealing with the resultant non-reentrancy on
On Mon, Jun 04, 2001 at 03:43:43PM -0400, Dan Sugalski wrote:
> At 08:34 PM 6/4/2001 +0100, Simon Cozens wrote:
> >On Mon, Jun 04, 2001 at 02:26:26PM -0500, David L. Nicol wrote:
> > > Does anyone have on-their-shelves a regex-into-non-regex-perl translator?
> >
> >Does anyone have on-their-shelve
> : Though whether being able to
> : yank out the RE engine and treat it as a standalone library is important
> : enough to warrant being treated as a design goal or not is a separate
> : issue. (I think so, as it also means I can treat it as a black box for the
> : moment so there's less to t
> Well, other languages have explored that option, and I think that makes
> for an unnatural interface. If you think of regexes as part of a
> larger language, you really want them to be as incestuous as possible,
These days we can be that without feeling that guilty since pcre exists.
> just a
On Tue, Jun 05, 2001 at 04:44:46PM -0700, Russ Allbery wrote:
> NeonEdge <[EMAIL PROTECTED]> writes:
>
> > This is evident in the "Musical Symbols" and even "Byzantine Musical
> > Symbols". Are these character sets more important than the actual
> > language character sets being denied to the ot
An interesting article in the July DDJ) in the Algorithm Alley:
"Fast and Small Resizable Arrays", presents a datastructure that
promises just what the subject says. Appending elements has the worst
case of O(sqrt(N)), as is the space wastage (which is the optimum, as
opposed to the usual wastage
> I can't really believe that this would be a problem, but if they're
> integrated alphabets from different locales, will there be issues
> with sorting (if we're not planning to use the locale)? Are there
> instances where like characters were combined that will affect the
> sort orders?
Yes, it
> > If this is the case, how would a regex like "^[a-zA-Z]" work (or other,
> more
> > sensitive characters)? If just about anything can come between A and Z,
> and
> > letters that might be there in a particular locale aren't in another
> locale,
> > then how will regex engine make the distinctio
> The A-Z syntax is really a shorthand for "All the uppercase letters".
> (Originally at least) I won't argue the problems with sorting various sets
> of characters in various locales, but for regexes at least it's not an
> issue, because the point isn't sorting or ordering, it's identifying
>
On Mon, Jun 11, 2001 at 01:05:43PM -0700, Russ Allbery wrote:
> Dan Sugalski <[EMAIL PROTECTED]> writes:
>
> > Should perl's regexes and other character comparison bits have an option
> > to consider different characters for the same thing as identical beasts?
> > I'm thinking in particular of t
On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
>
> We should let external collator to handle all these fancy features.
> People can always normalize/canonicalize/do-whatever-you-want
> and send the result text/binary to regex. All the features we
> argue about here can be easily done
> Perl came from ASCII-centric roots, so it's likely that most of our
> biases are ASCII-centric. And for a couple of reasons, it's going to
> be hard to deal with that:
>
> 1. Backwards compatability with existing Perl practice,
>
> and
>
> 2. To do language-neutral right is -really- hard; lo
> I think, following my line of thought, that [a-\N{KATAKANA LETTER KI}]
> should be equivalent to [\x{0061}-\x{30AD}], which would match any of
I think it should be an error. If you mean the code points write the
code points. Mixing symbolic names (KATAKANA LETTER KI) and native
characters (th
> RE Feature Override Create New
>
> switches 'i' only yes
> anchorsno no
(I would call them assertions.) Bzzt.
> - Anchors. ^,$,\A,\Z,\z,\b, \G. Since the definition of a line (see 'm'
> and 's' above) isn't
For reference, here's how Perl 5.8 will define \p{IsFoo} character
classes:
# 005F: SPACING UNDERSCROE
['IsWord', '$cat =~ /^[LMN]/ or $code eq "005F"', ''],
['IsAlnum', '$cat =~ /^[LMN]/',''],
['IsAlpha', '$cat =~ /^[LM]/', ''],
# 0009: HORIZONTAL TABULATION
#
> (ftp://ftp.iki.fi/pub/perl/snap), compile it first so you get a Makefile
ftp://ftp.funet.fi/pub/languages/perl/snap/
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
> I think you misunderstand my point. It is "a property of the code region",
> but "a property of the context in which is the code is running". For
> example,
> Taiwanese read traditional chinese characters, but PRC people read
> simplied chinese. Even we take the same data, and same program (code
> Gah. I thought (and I use the word loosely here) that locales generally
> specified how a particular character should be interpreted when there's
> some ambiguity--the high bit ASCII characters spring to mind, given there's
> a dozen or more different interpretations with them. I was under th
> Taiwanese read traditional chinese characters, but PRC people read
> simplied chinese. Even we take the same data, and same program (code),
> people just read differently. As an end user, I want to make the decision.
> It will drive me crazy if Perl render/display the text file using
> tradition
http://oss.software.ibm.com/icu/press.html
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
On Fri, Jun 29, 2001 at 02:52:03AM +0200, Bart Lateur wrote:
> On Tue, 19 Jun 2001 14:51:43 -0500, Jarkko Hietaniemi wrote:
>
> >But a locale is a collection of user preferences. How I want
> >my dates to be formatted, how I want my strings to be sorted.
>
> That
Silly stylistic nit:
> DS> struct perl_string {
> DS> void *string_buffer;
buffer
> DS> UV allocated;
> DS> UV byte_length;
bytes
> DS> UV flags;
> DS> UV character_length;
characters
> DS> UV encoding;
> DS> UV type;
> DS> UV unused;
Goo
> I'm not sure we need a separate flag for optimization level and
> assumptions, but if we do something more like what Compaq C does:
>
>-optimize=(level=5,inline) -assume=(nosub_redefinition,notype_change)
The () will unleash the hell in UNIX shells.
> Yes, I know it's wordier, but the ab
Would it make sense / be useful to have also distinct "between
statements" callbacks?
(Which reminds me of a clever hack Abigail once concocted to have code
executed at *block* exits...)
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
On Sat, Jul 07, 2001 at 03:07:52PM -0400, Dan Sugalski wrote:
> At 01:00 PM 7/7/2001 -0500, Jarkko Hietaniemi wrote:
> >Would it make sense / be useful to have also distinct "between
> >statements" callbacks?
>
> Yup. For the debugger if nothing else, and it's
On Sat, Jul 07, 2001 at 03:35:04PM -0400, Dan Sugalski wrote:
> At 02:23 PM 7/7/2001 -0500, Jarkko Hietaniemi wrote:
> >On Sat, Jul 07, 2001 at 03:07:52PM -0400, Dan Sugalski wrote:
> > > At 01:00 PM 7/7/2001 -0500, Jarkko Hietaniemi wrote:
> > > >Would it make
> For example, in a code coverage tool a callback would be desirable not
> only at the exit (or entry) of a block, or more accurately a linear code
> sequence, but also at various points throughout a conditional, so that
> it is possible to determine not only the truth value of the conditional,
>
On Sat, Jul 07, 2001 at 11:23:07PM +0100, Simon Cozens wrote:
> On Sat, Jul 07, 2001 at 05:10:03PM -0500, Jarkko Hietaniemi wrote:
> > BLB = block begin
> > BBB = basic block begin
>
> enter
>
> > SE = statement end
>
> nextstate
>
> > BBE = b
> Not that innovative, really. :) Will basic blocks ever be different
> from scopes?
The Book of the Red Dragon sayeth, p 528 in my copy:
A basic block is a sequence of consecutive statements
in which flow of control enters at the beginning and
leaves at the end without h
On Sun, Jul 08, 2001 at 12:13:27AM +0100, Simon Cozens wrote:
> On Sat, Jul 07, 2001 at 05:51:00PM -0500, Jarkko Hietaniemi wrote:
> > A basic block is a sequence of consecutive statements
> > in which flow of control enters at the beginning and
> > leaves at th
(To use Simon's nomenclature)
The long term goal of the Parrot build system (of which configuring is
the major part) is to bootstrap itself from ground zero, a la Parrot
von Münchausen.
Ground zero is here defined as a simple C file (possibly augmented by
one or more simple C headers) and a C co
> I think we should use int32_t instead of IV for all code related
> data. The IV is 64-bit on 64-bit machine, which is significant waste.
I always see this claim ("why would you use 64 bits unless you really
need them big, they must be such a waste") being bandied around, without
much hard numbe
On Thu, Sep 13, 2001 at 09:54:35AM -0500, Brian Wheeler wrote:
> I caught it trying to use inc_i_ic instead of inc_i in a test program I
> was running. this patch fixes it.
>
> Brian
>
>
> Index: assemble.pl
> ===
> RCS file: /hom
> > > Not, mind, that I'm proposing prepending parrot_ to all the filenames,
> > > though that's an option certainly.
> >
> >That would be fun on 8.3 filesystems :-).
>
> I'm seriously considering not going out of our way to support what could be
> reasonably considered antique systems. That'd i
On Thu, Sep 13, 2001 at 05:50:06PM +0100, Simon Cozens wrote:
> On Thu, Sep 13, 2001 at 09:43:06AM -0700, Damien Neil wrote:
> > The language lawyer in me insists that I point out that this is
> > inherently nonportable.
>
> That as may be, Perl 5 runs on nearly 80 platforms and uses this
> tric
On Fri, Sep 14, 2001 at 10:33:19AM -0500, Brian Wheeler wrote:
> On Fri, 2001-09-14 at 10:20, Dan Sugalski wrote:
> > Okay, we've had a number of people in favor of a good macro assembler for
> > Parrot. Given that, do we have anyone who'll volunteer to define, maintain,
> > and extend the thing
Just a quick note (since I have no time for more commentary...):
for inspiration on the data storage you might want to look at how
Storable has chosen to do things.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. --
> Have you taken a look at the old Amiga IFF format? It consisted mainly of
> "chunks" identified by a 32-bit type code and a chunk-length code. While
> most implementations were for specific multi-media applications (chunks
> defining sound formats, chunks defining image formats, etc), the
> > I believe that Microsoft is using a derivative of that format for some of
> > its files, and I think that TIFF files are another instantiation.
>
> To avoid using Redmondian references :-) I think IFF was one
> of the strongest inspirations for the PNG .
...and here's a link that explains t
On Fri, Sep 14, 2001 at 02:26:37PM -0700, Hong Zhang wrote:
> > We can't do that. There are platforms on both ends that
> > have _no_ native 32-bit data formats (Crays, some 16-bit
> > CPUs?). They still need to be able to load and generate
> > bytecode without ridiculuous CPU penalties (your Palm
> It will be hard to use one format for both native and portable.
Not one format, but a set of closely related formats with well-defined
transformations between them.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'.
I might try tonight doing *very* simple code to portably write and
read the bytecode header since I have access to 32/64 little/big
endian boxes and to the resident gremlin of a platform, UNICOS.
(Well, it's really UNICOS/mk, so it's LE, not BE, as real UNICOS,
but it still has the funky integer s
On Fri, Sep 14, 2001 at 03:01:11PM -0700, Damien Neil wrote:
> On Fri, Sep 14, 2001 at 04:42:21PM -0400, Dan Sugalski wrote:
> > Where all word values are as big as the word size says they are.
>
> What should the byteloader do when it encounters data in a word that
> cannot fit in a native word?
On Fri, Sep 14, 2001 at 06:37:33PM -0400, Dan Sugalski wrote:
> At 03:29 PM 9/14/2001 -0700, Damien Neil wrote:
> >On Sat, Sep 15, 2001 at 12:39:39AM +0300, Jarkko Hietaniemi wrote:
> > > > It will be hard to use one format for both native and portable.
> > >
>
table'.
# It is 'dead'. -- Jack Cohen
/*
* Copyright 2001 Jarkko Hietaniemi. All rights reserved.
* This is free software. It may be used, redistributed
* and/or modified under the same terms as Perl itself. */
#include "EXTERN.h"
#include "perl.h&quo
On Fri, Sep 14, 2001 at 04:42:21PM -0400, Dan Sugalski wrote:
> At 03:10 PM 9/14/2001 -0500, Brian Wheeler wrote:
> >I've been thinking alot about the bytecode file format lately. Its
> >going to get really gross really fast when we start adding other
> >(optional) sections to the code.
> >
> >So
[just (Sat 11:15 EET) checked-out copy]
(After running Configure.pl) make test_prog halts at:
cc -std -fprm d -ieee -D_INTRINSICS -DLANGUAGE_C -I..-c -o interpreter.o
interpreter.c
cc: Error: interpreter.c, line 97: In this statement, "end" is not declared.
(undeclared)
BUILD_TABLE(foo
On Sat, Sep 15, 2001 at 04:17:34PM +0100, Nicholas Clark wrote:
> On Fri, Sep 14, 2001 at 06:11:35PM -0400, Dan Sugalski wrote:
> > What we're doing is making sure the common case, the bytecode on disk being
> > used by the platform that owns the drive, is as fast as possible. We're
> > also mak
Well, I must have checked out at a bad moment since Linux+gcc 2.95.2
is not faring much better:
make test_prog|&head
cc -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I..-c -o interpreter.o interpreter.c
interpreter.c: In function `make_interpreter':
in
On Sat, Sep 15, 2001 at 04:35:49PM +0100, Simon Cozens wrote:
> On Sat, Sep 15, 2001 at 06:18:38PM +0300, Jarkko Hietaniemi wrote:
> > do { foo [ 0 ] = end ; foo [ 1 ] = set_i_ic ; foo [ 2 ] = set_i ; foo [ 3 ] =
>add_i ; foo [ 4 ] = sub_i ; foo [ 5 ] = mul_i ; foo [ 6 ] = div
On Sat, Sep 15, 2001 at 06:44:38PM +0300, Jarkko Hietaniemi wrote:
> On Sat, Sep 15, 2001 at 04:35:49PM +0100, Simon Cozens wrote:
> > On Sat, Sep 15, 2001 at 06:18:38PM +0300, Jarkko Hietaniemi wrote:
> > > do { foo [ 0 ] = end ; foo [ 1 ] = set_i_ic ; foo [ 2 ]
On Sat, Sep 15, 2001 at 04:44:47PM +0100, Simon Cozens wrote:
> On Sat, Sep 15, 2001 at 06:44:38PM +0300, Jarkko Hietaniemi wrote:
> > The question is why was it wrong after a fresh checkout? (also in Linux)
>
> No idea. Is make_op_header.pl run? Does op.h contain the #define
On Sat, Sep 15, 2001 at 04:44:47PM +0100, Simon Cozens wrote:
> On Sat, Sep 15, 2001 at 06:44:38PM +0300, Jarkko Hietaniemi wrote:
> > The question is why was it wrong after a fresh checkout? (also in Linux)
>
> No idea. Is make_op_header.pl run? Does op.h contain the #define
On Sat, Sep 15, 2001 at 04:44:47PM +0100, Simon Cozens wrote:
> On Sat, Sep 15, 2001 at 06:44:38PM +0300, Jarkko Hietaniemi wrote:
> > The question is why was it wrong after a fresh checkout? (also in Linux)
>
> No idea. Is make_op_header.pl run? Does op.h contain the #define
Don't have Blackdown installed but here are some timing and profiling
results (linux/x86, compiled with -pg, 'gprof test_prog' output,
remembering the caveat of profiling always skewing the measurements):
ugli:/tmp/jhi/parrot ; time ./test_prog j.pbc
r 119,14s u 119,14s s 0,00s 100% "./test_prog
On Sat, Sep 15, 2001 at 06:33:18PM +0100, Simon Cozens wrote:
> On Sat, Sep 15, 2001 at 06:32:57PM +0100, Philip Kendall wrote:
> > I posted a couple of bodge fixes from this, but I haven't done much in
> > the past couple of days... do we want to use a 32 bit type for reading
> > in bytecode or c
On Sat, Sep 15, 2001 at 08:16:36PM +0100, Simon Cozens wrote:
> On Sat, Sep 15, 2001 at 08:43:19PM +0300, Jarkko Hietaniemi wrote:
> > Never mind 'portable' for now, currently it's not even *working* on
> > 64-bit platforms
>
> That as may be, I'd li
On Sun, Sep 16, 2001 at 12:49:36PM -0700, [EMAIL PROTECTED] wrote:
> This should be done with an implicit rule or a pattern rule.
>
> By putting all the explicit lines in, it'll be harder to
> change later, and errors can crop up.
>
> The makefile needs a cleanup - we're not making good use of
>
On Mon, Sep 17, 2001 at 10:35:36AM -0400, Dan Sugalski wrote:
> At 04:41 AM 9/17/2001 -0700, Benjamin Stuhl wrote:
> >--- Simon Cozens <[EMAIL PROTECTED]> wrote:
> > > On Mon, Sep 17, 2001 at 09:33:56AM +0100, Tom Hughes
> > > wrote:
> > > > The attached patch adds string_nprintf, the last
> > > u
On Mon, Sep 17, 2001 at 02:18:16PM -0500, Gibbs Tanton - tgibbs wrote:
> The hourly should be fine...can you do me one other favor and run the
> following c snippet through Purify:
>
> int main() {
> char* c = (char*)malloc(0);
I can tell without Purify that malloc(0) is unportable.
(As is cal
On Mon, Sep 17, 2001 at 02:30:22PM -0500, Gibbs Tanton - tgibbs wrote:
> Well, that explains the last Purify issue. Whenever we substr out 0 bytes
> to a NULL register, we create a string by malloc(0). If we later print that
Not so. Depending on the implementation malloc(0) may do any of
the f
On Mon, Sep 17, 2001 at 02:33:53PM -0500, Gibbs Tanton - tgibbs wrote:
> Okey Dokey. With that being the case, it appears we should rethink
> string_grow/string_make. If we get a length of 0, we should allocate 1 byte
> and store '\0' in it (really storing '\0' is not necessary, but it is always
On Tue, Sep 18, 2001 at 12:06:32AM +0400, Timur Safin wrote:
> Hi Jarkko,
>
> Here is that the SUSV2 prescribe to do in this situation.
>
> The Single UNIX ® Specification, Version 2, Copyright © 1997 The Open Group
> "
> NAME
> malloc - a memory allocator
> ...
I'm reading the same pag
> return malloc(size ? size : 1);
>
> That's a constant pointer, and you can read 1 byte beyond it (the '\0')
The '\0'? You mean the pseudorandom byte that happens to be in the
heap at the beginning of the malloc block?
> Anything wrong with that as parrot's malloc wrapper?
That would keep bo
On Mon, Sep 17, 2001 at 05:29:11PM -0400, Dan Sugalski wrote:
> Folks,
>
> Don't sweat system malloc behaviour all that much at the moment. We are
> going to be completely taking over memory allocation internally at some
> point reasonably soon, so as long as what you do doesn't crash we should
On Tue, Sep 18, 2001 at 12:38:01AM +0300, Jarkko Hietaniemi wrote:
> On Mon, Sep 17, 2001 at 05:29:11PM -0400, Dan Sugalski wrote:
> > Folks,
> >
> > Don't sweat system malloc behaviour all that much at the moment. We are
> > going to be completely taking over
> > Doug Lea's malloc is in the public domain:
> >
> > http://g.oswego.edu/dl/html/malloc.html
> >
> > I don't remember whether that's quadsafe code but the first person to
>
> It is.
>
> Further digging found this comparison discussing malloc and gc
> implementations:
>
> http://www.cs.color
On Mon, Sep 17, 2001 at 05:54:41PM -0400, Dan Sugalski wrote:
> At 12:51 AM 9/18/2001 +0300, Jarkko Hietaniemi wrote:
> > > > Doug Lea's malloc is in the public domain:
> > > >
> > > > http://g.oswego.edu/dl/html/malloc.html
> > > >
>
On Tue, Sep 18, 2001 at 10:48:25AM -0400, Andy Dougherty wrote:
> When trying to configure parrot with an IV = 'long long' (64-bit)
> but with int, long, and pointers only 32-bit, I get either
>
> This isn't Parrot bytecode!
>
> (on SPARC) or a segfault (on i686).
Segfault in alpha even wh
On Tue, Sep 18, 2001 at 09:53:23PM +0100, Simon Cozens wrote:
> On Tue, Sep 18, 2001 at 03:31:11PM -0500, Gibbs Tanton - tgibbs wrote:
> > 8. I would love someone to test it on Tru64 and Win32.
>
> Testing anything on Tru64 is currently impossible, as Jarkko has pointed
> out. I'm still trying t
On Tue, Sep 18, 2001 at 06:47:38PM -0700, Hong Zhang wrote:
>
> Do we want the opcode to be so complicated? I thought we are
> going to use this kind of thing for generic pointers. The "p"
> member of opcode does not make any sense to me.
Alignment.
> Hong
>
> > Earlier there was some discussi
> And, maybe even more important, not all the world has gcc!
Hear, hear.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
On Sat, Sep 22, 2001 at 05:17:16PM +0100, Simon Cozens wrote:
> On Sat, Sep 22, 2001 at 04:40:46PM +0100, Simon Cozens wrote:
> > And now I know why! The branch-fixup section of the assembler's busted:
>
> No, that wasn't it. This is it:
>
> opcode_t *ne_nc_ic(opcode_t cur_opcode[], struct Parro
> On this machine, NVs are doubles; their pack_type is 'd', which is
> as it should be. So, a number (2.0) is inserted into the bytecode
> stream with pack('d', 2.0)
./perl -Ilib -MDevel::Peek -le 'Dump(pack("d",2))'
SV = PV(0x1d9ed4) at 0x1e4568
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x
Tru64 cc is worried:
cc: Warning: packfile.c, line 1226: The scalar variable "encoding" is fetched but not
initialized. (uninit1)
new_string = string_make(data, size, encoding, flags, type);
-^
cc: Warning: packfile.c, line 1226: The scalar variable "t
> both ord and chr only work for characters that fit in the Basic
> Multilingual Plane (this shouldn't be a problem until around 2003, when
> Unicode 2.0 starts introducing characters above this)
Huh?
Characters beyond the BMP (aka Plane 0, 0x...0x) were
introduced in Unicode 3.1 (Ma
On Fri, Oct 19, 2001 at 07:21:45PM -0400, James Mastros wrote:
> On Sat, 20 Oct 2001, Jarkko Hietaniemi wrote:
> > Characters beyond the BMP (aka Plane 0, 0x...0x) were
> > introduced in Unicode 3.1 (March 2001), so you are 1.1 versions
> > and 2 years late. The
Please find attached a typescript log from a freshly rsynced parrot
in tru64 (sizer -v reports 4.0F for the os release, and cc version
is V5.9-011). A bit of whinage from the compiler, mostly of the kind
cc: Warning: perlint.c, line 78: Non-void function "Parrot_PerlInt_get_string_index"
does n
Developing
Last Modified:
PDD Format: 1
Language: English
=head2 History
None. First version. (Jarkko Hietaniemi)
=head1 CHANGES
None. First version.
=head1 ABSTRACT
This PDD specifies how Parrot should handle characters and text,
the character encoding model, and the bas
On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote:
> Thanks, Jarrko.
>
> On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote:
> > The most important message is that give up on 8-bit bytes, already.
> > Time to move on, chop chop.
>
> Do you thin
> Since I seem to be the main regex hacker for Parrot, I'll respond to
> this as best I can.
>
> Currently, we are using bitmaps for character classes. Well, sort of.
> A Bitmap in Parrot is defined like this:
>
> typedef struct bitmap_t {
> char* bmp;
>
> I don't think UTF-32 will save you much. The unicode case map is variable
> length, combining character, canonical equivalence, and many other thing
> will require variable length mapping. For example, if I only want to
This is true.
> parse /[0-9]+/, why you want to convert everything to UTF-
On Fri, Jan 18, 2002 at 11:44:00AM -0800, Hong Zhang wrote:
> > (1) There are 5.125 bytes in Unicode, not four.
> > (2) I think the above would suffer from the same problem as one common
> > suggestion, two-level bitmaps (though I think the above would suffer
> > less, being of finer granu
On Fri, Jan 18, 2002 at 12:20:53PM -0800, Hong Zhang wrote:
> > > My proposal is we should use mix method. The Unicode standard class,
> > > such as \p{IsLu}, can be handled by a standard splitbin table. Please
> > > see Java java.lang.Character or Python unicodedata_db.h. I did
> > > measurement
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote:
> On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> > ints, or 176 bytes. Searching for membership in an inversion list is
> > O(N log N) (binary search). "Encoding the whole range" is a non
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote:
> On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> > ints, or 176 bytes. Searching for membership in an inversion list is
> > O(N log N) (binary search). "Encoding the whole range" is a non
On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote:
> On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote:
> > Complement of an inversion list is neat: insert 0 at the beginning
> > (and append max+1), unless there already is one, in which case delete
>
> > > We *do* want to have (with some notation)
> > > [[:digit:]\p{FunkyLooking}aeiou except 7], right?
> >
> > Of course. But that is all resolvable in regex compile time.
> > No expression tree needed.
>
> My point was that if inversion lists are insufficient for describing
> all the characte
On Fri, Jan 18, 2002 at 11:40:17PM +, Nicholas Clark wrote:
> On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote:
>
> > > As for character encodings, we're forcing everything to UTF-32 in
> > > regular expressions. No exceptions. If you use a s
Honour where honour is due: I've got some questions about inversion
lists. Where I saw them mentioned by that name were some drafts of
this:
http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html
The book looks really promising-- unfortunately it's not yet published.
--
$jhi++;
On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > There is no string type built out of native eight-bit bytes.
>
> In the good ol'days, one could usefully use regexes on 8-bit binary data,
> eg
>
On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > > eg
> > >
> > > open G, 'myfile.gif
I think the following would work.
* At the beginning of each parrot source code file there must be at
least two Parrot-specific defines, e.g.
#define PARROT_SOURCE
#define PARROT_SOURCE_REGEXEC_C
These would declare both being part of Parrot, and being
a particular file.
If some ki
On Fri, May 31, 2002 at 06:18:55AM +0900, Dan Kogai wrote:
> On Friday, May 31, 2002, at 06:06 AM, George Rhoten wrote:
> > Hopefully you take the implicit information in the UCM files and put
> > that
> > into encode implementation too. For instance, in gb18030 there are
> > whole
> > ranges o
1 - 100 of 320 matches
Mail list logo