Patch to link with the maths library
Now that parrot has the advanced math ops in it needs to link with the maths library or you get lots of missing symbols. Patch as follows: Index: Makefile === RCS file: /home/perlcvs/parrot/Makefile,v retrieving revision 1.9 diff -c -r1.9 Makefile *** Makefile2001/09/13 07:22:36 1.9 --- Makefile2001/09/13 08:20:54 *** *** 12,18 all : $(O_FILES) test_prog test_prog: test_main$(O) $(O_FILES) ! gcc -o test_prog $(O_FILES) test_main$(O) test_main$(O): $(H_FILES) --- 12,18 all : $(O_FILES) test_prog test_prog: test_main$(O) $(O_FILES) ! gcc -o test_prog $(O_FILES) test_main$(O) -lm test_main$(O): $(H_FILES) Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Patch to fix += on rvalue
The inc_n_nc op does this: (NV)NUM_REG(P1) += P2 Unfortunately the (NV) cast means that the LHS is not an lvalue and cannot therefore be assigned to in ANSI C. It seems that gcc allows you to get away with this, but other compiler don't. The cast is also unnecessary as NUM_REG() gives an NV anyway, so this patch removes the cast: Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.11 diff -u -r1.11 basic_opcodes.ops --- basic_opcodes.ops 2001/09/13 07:27:46 1.11 +++ basic_opcodes.ops 2001/09/13 08:27:40 @@ -219,7 +219,7 @@ // INC Nx, nnn AUTO_OP inc_n_nc { - (NV)NUM_REG(P1) += P2; + NUM_REG(P1) += P2; } // DEC Nx Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Patch to fix C++ style comments
The parrot code is currently full of C++ style comments which cause many C compilers to barf. The attached patch changes these to C style comments to fix this problem. BTW I have had to resend this because my first attempt was bounced apparently for having the patch as a text/plain attachment rather than inline. Isn't that a bit OTT though? I can understand blocking HTML messages and attachments but I prefer to send patches as attachments as it ensures that trailing blank lines and such like are properly preserved and basically that the patch arrives completely intact. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu diff -u parrot/basic_opcodes.ops parrot.fixed/basic_opcodes.ops --- parrot/basic_opcodes.opsThu Sep 13 08:27:46 2001 +++ parrot.fixed/basic_opcodes.ops Thu Sep 13 09:23:13 2001 @@ -7,47 +7,47 @@ #include "parrot.h" #include "math.h" -// SET Ix, CONSTANT +/* SET Ix, CONSTANT */ AUTO_OP set_i_ic { INT_REG(P1) = P2; } -// SET Ix, Ix +/* SET Ix, Ix */ AUTO_OP set_i { INT_REG(P1) = INT_REG(P2); } -// ADD Ix, Iy, Iz +/* ADD Ix, Iy, Iz */ AUTO_OP add_i { INT_REG(P1) = INT_REG(P2) + INT_REG(P3); } -// SUB Ix, Iy, Iz +/* SUB Ix, Iy, Iz */ AUTO_OP sub_i { INT_REG(P1) = INT_REG(P2) - INT_REG(P3); } -// MUL Ix, Iy, Iz +/* MUL Ix, Iy, Iz */ AUTO_OP mul_i { INT_REG(P1) = INT_REG(P2) * INT_REG(P3); } -// DIV Ix, Iy, Iz +/* DIV Ix, Iy, Iz */ AUTO_OP div_i { INT_REG(P1) = INT_REG(P2) / INT_REG(P3); } -// MOD Ix, Iy, Iz +/* MOD Ix, Iy, Iz */ AUTO_OP mod_i { INT_REG(P1) = INT_REG(P2) % INT_REG(P3); } -// EQ Ix, Iy, EQ_BRANCH, NE_BRANCH +/* EQ Ix, Iy, EQ_BRANCH, NE_BRANCH */ MANUAL_OP eq_i_ic { if (INT_REG(P1) == INT_REG(P2)) { RETURN(P3); @@ -56,7 +56,7 @@ } } -// NE Ix, Iy, NE_BRANCH, EQ_BRANCH +/* NE Ix, Iy, NE_BRANCH, EQ_BRANCH */ MANUAL_OP ne_i_ic { if (INT_REG(P1) != INT_REG(P2)) { RETURN(P3); @@ -65,7 +65,7 @@ } } -// LT Ix, Iy, LT_BRANCH, GE_BRANCH +/* LT Ix, Iy, LT_BRANCH, GE_BRANCH */ MANUAL_OP lt_i_ic { if (INT_REG(P1) < INT_REG(P2)) { RETURN(P3); @@ -74,7 +74,7 @@ } } -// LE Ix, Iy, LE_BRANCH, GT_BRANCH +/* LE Ix, Iy, LE_BRANCH, GT_BRANCH */ MANUAL_OP le_i_ic { if (INT_REG(P1) <= INT_REG(P2)) { RETURN(P3); @@ -83,7 +83,7 @@ } } -// GT Ix, Iy, GT_BRANCH, LE_BRANCH +/* GT Ix, Iy, GT_BRANCH, LE_BRANCH */ MANUAL_OP gt_i_ic { if (INT_REG(P1) > INT_REG(P2)) { RETURN(P3); @@ -92,7 +92,7 @@ } } -// GE Ix, Iy, GE_BRANCH, LT_BRANCH +/* GE Ix, Iy, GE_BRANCH, LT_BRANCH */ MANUAL_OP ge_i_ic { if (INT_REG(P1) >= INT_REG(P2)) { RETURN(P3); @@ -101,7 +101,7 @@ } } -// IF IXx, TRUE_BRANCH, FALSE_BRANCH +/* IF IXx, TRUE_BRANCH, FALSE_BRANCH */ MANUAL_OP if_i_ic { if (INT_REG(P1)) { RETURN(P2); @@ -110,81 +110,81 @@ } } -// TIME Ix +/* TIME Ix */ AUTO_OP time_i { INT_REG(P1) = time(NULL); } -// PRINT Ix +/* PRINT Ix */ AUTO_OP print_i { printf("I reg %li is %li\n", P1, INT_REG(P1)); } -// BRANCH CONSTANT +/* BRANCH CONSTANT */ MANUAL_OP branch_ic { RETURN(P1); } -// END +/* END */ MANUAL_OP end { RETURN(0); } -// INC Ix +/* INC Ix */ AUTO_OP inc_i { INT_REG(P1)++; } -// INC Ix, nnn +/* INC Ix, nnn */ AUTO_OP inc_i_ic { INT_REG(P1) += P2; } -// DEC Ix +/* DEC Ix */ AUTO_OP dec_i { INT_REG(P1)--; } -// DEC Ix, nnn +/* DEC Ix, nnn */ AUTO_OP dec_i_ic { INT_REG(P1) -= P2; } -// JUMP Ix +/* JUMP Ix */ MANUAL_OP jump_i { RETURN(INT_REG(P1)); } -// SET Nx, CONSTANT +/* SET Nx, CONSTANT */ AUTO_OP set_n_nc { NUM_REG(P1) = P2; } -// ADD Nx, Ny, Nz +/* ADD Nx, Ny, Nz */ AUTO_OP add_n { NUM_REG(P1) = NUM_REG(P2) + NUM_REG(P3); } -// SUB Nx, Ny, Iz +/* SUB Nx, Ny, Iz */ AUTO_OP sub_n { NUM_REG(P1) = NUM_REG(P2) - NUM_REG(P3); } -// MUL Nx, Ny, Iz +/* MUL Nx, Ny, Iz */ AUTO_OP mul_n { NUM_REG(P1) = NUM_REG(P2) * NUM_REG(P3); } -// DIV Nx, Ny, Iz +/* DIV Nx, Ny, Iz */ AUTO_OP div_n { NUM_REG(P1) = NUM_REG(P2) / NUM_REG(P3); } -// EQ Nx, Ny, EQ_BRANCH, NE_BRANCH +/* EQ Nx, Ny, EQ_BRANCH, NE_BRANCH */ MANUAL_OP eq_n_ic { if (NUM_REG(P1) == NUM_REG(P2)) { RETURN(P3); @@ -193,7 +193,7 @@ } } -// IF Nx, TRUE_BRANCH, FALSE_BRANCH +/* IF Nx, TRUE_BRANCH, FALSE_BRANCH */ MANUAL_OP if_n_ic { if (NUM_REG(P1)) { RETURN(P2); @@ -202,369 +202,369 @@ } } -// TIME Nx +/* TIME Nx */ AUTO_OP time_n { NUM_REG(P1) = time(NULL); } -// PRINT Nx +/* PRINT Nx */ AUTO_OP print_n { printf("N reg %li is %f\n", P1, NUM_REG(P1)); } -// INC Nx +/* INC Nx */ AUTO_OP inc_n { NUM_REG(P
Patch to remove use of structure constant/cast
Setting up the strnative vtable is being done by casting a {} delimited list of values to a structure type but this is a gcc extension not ANSI C, so the following patch reworks this to be ANSI compliant: Index: strnative.c === RCS file: /home/perlcvs/parrot/strnative.c,v retrieving revision 1.4 diff -u -r1.4 strnative.c --- strnative.c 2001/09/13 07:14:24 1.4 +++ strnative.c 2001/09/13 08:36:34 @@ -55,7 +55,7 @@ STRING_VTABLE string_native_vtable (void) { -return (STRING_VTABLE) { +STRING_VTABLE sv = { enc_native, string_native_compute_strlen, string_native_max_bytes, @@ -63,4 +63,5 @@ string_native_chopn, string_native_substr, }; +return sv; } Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Patch to fix C++ style comments
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Thu, Sep 13, 2001 at 09:35:33AM +0100, Tom Hughes wrote: > > BTW I have had to resend this because my first attempt was bounced > > apparently for having the patch as a text/plain attachment rather than > > inline. Isn't that a bit OTT though? > > Hrm, I think other people have managed... Wierd. Must be something to do with the MIME that Gnus created then. > Both this, and the other patch, (struct in strnative.c) applied. I just realised I missed one: === RCS file: /home/perlcvs/parrot/config.h.in,v retrieving revision 1.1 diff -u -r1.1 config.h.in --- config.h.in 2001/09/11 09:44:00 1.1 +++ config.h.in 2001/09/13 08:52:26 @@ -13,7 +13,7 @@ typedef void DPOINTER; typedef void SYNC; -//typedef IV *(*opcode_funcs)(void *, void *) OPFUNC; +/*typedef IV *(*opcode_funcs)(void *, void *) OPFUNC; */ #define FRAMES_PER_CHUNK 16 Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Patch to fix arithmetic on void * pointers
This patch fixes a couple of cases where arithmetic on void * pointers is being done, which isn't valid although gcc seems to allow it. Of course the memory.c code is broken anyway because it assumes a pointer will fit in an IV and I'm not sure that will always be true will it? Anyway with this patch and the others it now builds on a Unixware box with the system compiler: Index: memory.c === RCS file: /home/perlcvs/parrot/memory.c,v retrieving revision 1.3 diff -u -r1.3 memory.c --- memory.c2001/09/12 17:58:55 1.3 +++ memory.c2001/09/13 09:00:34 @@ -26,7 +26,7 @@ mem = malloc(max_to_alloc); if (((IV)mem & mask) < (IV)mem) { -mem = (void *)((IV)mem & mask) + ~mask + 1; +mem = (void *)(((IV)mem & mask) + ~mask + 1); } return mem; } Index: strnative.c === RCS file: /home/perlcvs/parrot/strnative.c,v retrieving revision 1.5 diff -u -r1.5 strnative.c --- strnative.c 2001/09/13 08:44:08 1.5 +++ strnative.c 2001/09/13 09:00:34 @@ -26,7 +26,7 @@ /* b is now in native format */ string_grow(a, a->strlen + b->strlen); -Sys_Memcopy(a->bufstart + a->strlen, b->bufstart, b->strlen); +Sys_Memcopy((char *)a->bufstart + a->strlen, b->bufstart, b->strlen); a->strlen = a->bufused = a->strlen + b->strlen; return a; } @@ -47,7 +47,7 @@ /* Offset and length have already been "normalized" */ string_grow(dest, src->strlen - length); -Sys_Memcopy(dest->bufstart, src->bufstart + offset, length); +Sys_Memcopy(dest->bufstart, (char *)src->bufstart + offset, length); dest->strlen = dest->bufused = length; return dest; Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String API
In message <[EMAIL PROTECTED]> Benjamin Stuhl <[EMAIL PROTECTED]> wrote: > Thus wrote the illustrious Simon Cozens: > [severely trimmed] > > STRING* string_make(void *buffer, IV buflen, IV > > encoding, IV flags, IV type) > > STRING* string_copy(STRING* s) > > void string_destroy(STRING *s) > > *cough* Namespace pollution *cough* > > These should proably all be prefixed... Especially since all function names starting with str are strictly speaking reserved to ANSI/ISO for future expansion of the string.h facilities ;-) Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Patch to fix not op
The not op seems to be doing a logical not rather than a bitwise not. Patch to fix it is as follows: Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.17 diff -u -r1.17 basic_opcodes.ops --- basic_opcodes.ops 2001/09/16 15:49:22 1.17 +++ basic_opcodes.ops 2001/09/16 16:27:30 @@ -564,7 +564,7 @@ /* NOT_i */ AUTO_OP not_i { - INT_REG(P1) = ! INT_REG(P2); + INT_REG(P1) = ~ INT_REG(P2); } /* OR_i */ Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Patch to add string_nprintf
The attached patch adds string_nprintf, the last unimplemented function listed in strings.pod as far as I can see. It should cope with both the differences in return values for vsnprintf between different versions of glibc but there are still a few platforms which may have problems as they have a vsnprintf which exhibits a third form of behaviour in the return value, namely that they return the amount they did manage to produce on overflow. I'm not sure there is a clean way to cope with that interface without a configure test to detect it. Equally older systems may not have a vsnprintf at all which leaves with a problem on those systems. On a vaguely related note string_substr takes a STRING** for the destination which seems redundant given than it returns the dest string, and doesn't even fill in the argument if it does do the allocation itself. I would suggest making it a STRING* which would then be consistent with the nprintf interface. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu ? xxx Index: parrot.h === RCS file: /home/perlcvs/parrot/parrot.h,v retrieving revision 1.8 diff -u -r1.8 parrot.h --- parrot.h2001/09/16 22:05:21 1.8 +++ parrot.h2001/09/17 08:20:49 @@ -43,6 +43,7 @@ #include #include #include +#include #define NUM_REGISTERS 32 #define PARROT_MAGIC 0x13155a1 Index: string.c === RCS file: /home/perlcvs/parrot/string.c,v retrieving revision 1.7 diff -u -r1.7 string.c --- string.c2001/09/16 01:45:51 1.7 +++ string.c2001/09/17 08:20:49 @@ -139,6 +139,21 @@ return (ENC_VTABLE(s)->chopn)(s, n); } +/*=for api string string_nprintf + * format output into a string. + */ +STRING* +string_nprintf(STRING* dest, IV len, char* format, ...) { +va_list ap; +if (!dest) { +dest = string_make(NULL, 0, enc_native, 0, 0); +} +va_start(ap, format); +dest = (ENC_VTABLE(dest)->nprintf)(dest, len, format, ap); +va_end(ap); +return dest; +} + /* * Local variables: * c-indentation-style: bsd Index: string.h === RCS file: /home/perlcvs/parrot/string.h,v retrieving revision 1.6 diff -u -r1.6 string.h --- string.h2001/09/16 01:45:51 1.6 +++ string.h2001/09/17 08:20:49 @@ -32,6 +32,7 @@ typedef STRING* (*string_iv_to_string_t)(STRING *, IV); typedef STRING* (*two_strings_iv_to_string_t)(STRING *, STRING *, IV); typedef STRING* (*substr_t)(STRING*, IV, IV, STRING*); +typedef STRING* (*nprintf_t)(STRING*, IV, char*, va_list); typedef IV (*iv_to_iv_t)(IV); struct string_vtable { @@ -41,6 +42,7 @@ two_strings_iv_to_string_t concat; /* Append string b to the end of string a */ string_iv_to_string_t chopn;/* Remove n characters from the end of a string */ substr_t substr;/* Substring operation */ +nprintf_t nprintf; /* Formatted output operation */ }; struct parrot_string { @@ -67,6 +69,8 @@ string_chopn(STRING*, IV); STRING* string_substr(STRING*, IV, IV, STRING**); +STRING* +string_nprintf(STRING*, IV, char*, ...); /* Declarations of other functions */ IV Index: strnative.c === RCS file: /home/perlcvs/parrot/strnative.c,v retrieving revision 1.10 diff -u -r1.10 strnative.c --- strnative.c 2001/09/16 01:45:51 1.10 +++ strnative.c 2001/09/17 08:20:49 @@ -80,6 +80,36 @@ return dest; } +/*=for api string_native string_native_nprintf + format output into a string. +*/ +static STRING* +string_native_nprintf(STRING* dest, IV len, char* format, va_list ap) { +if (len > 0) { +string_grow(dest, len); +len = vsnprintf(dest->bufstart, len, format, ap); +if (len > dest->buflen) { +len = dest->buflen; +} +} +else { +while (len == 0 || len > dest->buflen) +{ +if (len < 0) { +string_grow(dest, dest->buflen * 2); +} +else if (len > dest->buflen) { +string_grow(dest, len); +} +len = vsnprintf(dest->bufstart, dest->buflen, format, ap); +} +} + +dest->strlen = dest->bufused = len; + +return dest; +} + /*=for api string_native string_native_vtable return the vtable for the native string */ @@ -92,6 +122,7 @@ string_native_concat, string_native_chopn, string_native_substr, +string_native_nprintf }; return sv; } Index: docs/strings.pod === RCS file: /home/perlcvs/parrot/docs/strings.pod,v retrieving revision 1.3 diff -u -r1.3 strings.pod --- docs/strings.pod2001/09/13 08:39:49 1.3 +++ docs/strings.pod2001/09/17
Re: Patch to fix C++ style comments
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > In message <[EMAIL PROTECTED]> > Simon Cozens <[EMAIL PROTECTED]> wrote: > > > On Thu, Sep 13, 2001 at 09:35:33AM +0100, Tom Hughes wrote: > > > BTW I have had to resend this because my first attempt was bounced > > > apparently for having the patch as a text/plain attachment rather than > > > inline. Isn't that a bit OTT though? > > > > Hrm, I think other people have managed... > > Wierd. Must be something to do with the MIME that Gnus created then. I think I've worked this out... The problem seems to be that Gnus doesn't bother adding a Content-Type header to the sections of the multipart message on the grounds that text/plain is the default content type, but the filters on the mailing list obviously don't know that text/plain is the default. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: "Feature Freeze"
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > So, if you're running on one of the core platforms, please check out a > *clean* CVS copy, try and build and post the output of make test. Tests cleanly on linux/x86: perl t/harness t/op/basic..ok, 1/2 skipped: label constants unimplemented in assembler t/op/integerok t/op/number.ok, 2/23 skipped: various reasons t/op/string.ok, 1/5 skipped: I'm unable to write it! t/op/trans..ok All tests successful, 4 subtests skipped. Files=5, Tests=74, 45 wallclock secs (38.60 cusr + 6.28 csys = 44.88 CPU) Builds cleanly with -Wall with the exception of these warnings in packfile.c: packfile.c:964:3: warning: "/*" within comment packfile.c:967:3: warning: "/*" within comment packfile.c: In function `PackFile_unpack': packfile.c:323: warning: int format, IV arg (arg 3) packfile.c:344: warning: int format, IV arg (arg 3) packfile.c:287: warning: unused variable `byte_code_ptr' packfile.c:285: warning: unused variable `segment_ptr' packfile.c: In function `PackFile_dump': packfile.c:461: warning: unsigned int format, long unsigned int arg (arg 2) packfile.c:474: warning: unsigned int format, long unsigned int arg (arg 2) packfile.c:476: warning: unsigned int format, long unsigned int arg (arg 2) packfile.c: In function `PackFile_ConstTable_dump': packfile.c:938: warning: int format, IV arg (arg 2) packfile.c: In function `PackFile_Constant_unpack': packfile.c:1233: warning: unused variable `i' packfile.c: In function `PackFile_Constant_dump': packfile.c:1358: warning: unsigned int format, long unsigned int arg (arg 2) The attached patch will clean up those warnings. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: packfile.c === RCS file: /home/perlcvs/parrot/packfile.c,v retrieving revision 1.4 diff -u -r1.4 packfile.c --- packfile.c 2001/09/20 21:41:40 1.4 +++ packfile.c 2001/09/20 22:41:46 @@ -180,7 +180,7 @@ ***/ -void +void PackFile_set_magic(PackFile * self, IV magic) { self->magic = magic; } @@ -282,9 +282,7 @@ IV PackFile_unpack(PackFile * self, char * packed, IV packed_size) { -IV * segment_ptr; IV segment_size; -char * byte_code_ptr; char * cursor; IV * iv_ptr; @@ -317,9 +315,9 @@ iv_ptr = (IV *)cursor; segment_size = *iv_ptr; cursor += sizeof(IV); - + if (segment_size % sizeof(IV)) { -fprintf(stderr, "PackFile_unpack: Illegal fixup table segment size %d (must be multiple of %d!\n", +fprintf(stderr, "PackFile_unpack: Illegal fixup table segment size %ld (must +be multiple of %d!\n", segment_size, sizeof(IV)); return 0; } @@ -338,13 +336,13 @@ iv_ptr = (IV *)cursor; segment_size = *iv_ptr; cursor += sizeof(IV); - + if (segment_size % sizeof(IV)) { -fprintf(stderr, "PackFile_unpack: Illegal constant table segment size %d (must be multiple of %d!\n", +fprintf(stderr, "PackFile_unpack: Illegal constant table segment size %ld +(must be multiple of %d!\n", segment_size, sizeof(IV)); return 0; } - + if (!PackFile_ConstTable_unpack(self->const_table, cursor, segment_size)) { fprintf(stderr, "PackFile_unpack: Error reading constant table segment!\n"); return 0; @@ -366,7 +364,7 @@ self->byte_code_size = 0; return 0; } - + mem_sys_memcopy(self->byte_code, cursor, self->byte_code_size); } @@ -432,7 +430,7 @@ iv_ptr = (IV *)cursor; *iv_ptr = const_table_size; cursor += sizeof(IV); - + PackFile_ConstTable_pack(self->const_table, cursor); cursor += const_table_size; @@ -458,7 +456,7 @@ PackFile_dump(PackFile * self) { IV i; -printf("MAGIC => 0x%08x,\n", self->magic); +printf("MAGIC => 0x%08lx,\n", self->magic); printf("FIXUP => {\n"); PackFile_FixupTable_dump(self->fixup_table); @@ -471,9 +469,9 @@ printf("BCODE => ["); for (i = 0; i < self->byte_code_size / 4; i++) { if (i % 8 == 0) { -printf("\n%08x: ", i * 4); +printf("\n%08lx: ", i * 4); } -printf("%08x ", ((IV *)(self->byte_code))[i]); +printf("%08lx ", ((IV *)(self->byte_code))[i]); } printf("\n]\n"); @@ -837,7 +835,7 @@ iv_ptr = (IV *)cursor; self->const_count = *iv_ptr; cursor += sizeof(IV); - + if (self->const_count == 0) { return 1; } @@ -857,7 +855,7 @@ cursor += PackFile_Constant_pack_size(self->constants[i]); }
Re: instructions per second benchmark (in parrot ;)
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > That's actually what test.pasm tests. :) I just checked in a new version > that prints labels. > > FWIW, my 600MHz Alpha clocks in at around 23M ops/sec. Nyah! ;-P I have test.pasm reporting 7.14M ops/sec on a 200MHz K6 running linux with the interpreter compiled -O3. That's about twice the speed that I get without any optimisation. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: instructions per second benchmark (in parrot ;)
In message <20010920190703.S28291@blackrider> Michael G. Schwern <[EMAIL PROTECTED]> wrote: > I'm getting 2.67 MIPS with -O3. > > Hmmm, why would a K6/200 come out so much faster than a G3/266? If > anything it should be the other way around. No idea I'm afraid. I've just clocked 42.86M on an Athlon/1333 though ;-) At the other end of the scale a P5/90 manages 2.91M ops/sec. Taken together (and with the K6/200 time) that is something fairly close to linear scaling with clock speed on x86 machines although the K6/200 seems to be beating the odds a little. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Have I given the big "The Way Strings Should Work" talk?
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > I've given it a few places, but I don't know that I've sent it to > perl6-internals. If not, or if I should do it again, let me know. I want to > make sure we're all on the same page here. Not that I recall. I thought that was what strings.pod was... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: PMCs and how the opcode functions will work
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > I've now changed the vtable structure to reflect this, but I'd like someone > to confirm that the "variant" forms of the ops can be addressed the way I > think they can. (ie. structure->base_element + 1 to get "thing after > base_element") Legally speaking they can't as ISO C says that you can't do pointer calculations and comparisons across object boundaries and separate members of a structure are different objects. If you replace this: set_integer_method_t set_integer_1; set_integer_method_t set_integer_2; set_integer_method_t set_integer_3; set_integer_method_t set_integer_4; set_integer_method_t set_integer_5; with this: set_integer_method_t set_integer[5]; then you would be able to, as an array is all one object. Practically speaking I think it will work on every system that I can think of at the moment but who knows what wierd things are out there... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Bugfix for push_generic_entry
In message <[EMAIL PROTECTED]> Jason Gloudon <[EMAIL PROTECTED]> wrote: > The "stacktest" patch will fail on the current CVS source, due to a bug in > push_generic_entry. This looks good to me so I have committed it. Thanks for spotting it! Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Resync your CVS...
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Mon, 22 Oct 2001, Sam Tregar wrote: > > > Fresh checkout won't compile on Redhat Linux 7.1: > > Damn. It compiled cleanly before I checked it in. I'll patch up again and > see what I missed. Probably some odd dependency or timing issue > somewhere. (It's emacs fault! Yeah, that's the ticket! :) I'd already patched it up, so I've just committed my fix... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > =item type > > What the character set or type of data is encoded in the buffer. This > includes things like ASCII, EBCDIC, Unicode, Chinese Traditional, > Chinese Simplified, or Shift-JIS. (And yes, I know the latter's a > combination of type and encoding. I'll update the doc as soon as I can > reasonablty separate the two) Isn't this going to need to be a vtable pointer like encoding is? Only some things (like character classification and at least some transcoding tasks) will be character set based rather than encoding based. Other than that it looked quite good and I'll probably start looking at bending the existing code into the new model over the weekend. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Ooops, sorry for that blank log message.
In message <[EMAIL PROTECTED]> Brian Wheeler <[EMAIL PROTECTED]> wrote: > Darn it, I fat fingered the log message. > > This is a fix which changes the way op variants are handled. The old > method "forgot" the last variant, so thing(i,i|ic,i|ic) would > generate: > thing(i,i,i) > thing(i,i,ic) > thing(i,ic,i) > > but not > > thing(i,ic,ic) It didn't forget it, it went to some considerable trouble to ignore it on the grounds that such an opcode is pointless as alll the operands are constant. I did describe the algorithm used and the logic behind it on the list when I implemented it. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Other than that it looked quite good and I'll probably start looking at > bending the existing code into the new model over the weekend. Attached is my first pass at this - it's not fully ready yet but is something for people to cast an eye over before I spend lots of time going down the wrong path ;-) The encoding_lookup() and chartype_lookup() routines will obviously need to load the relevant libraries on the fly when we have support for that. The packfile stuff is just a hack to make it work for now. Presumably we will have to modify the byte code format to record the string types as names or something so we can look them up properly? String comparison is not language sensitive here - as before it just compares based on character values. Other than that I think it's aiming in the right direction and it does pass all the tests... Please correct me if I'm wrong. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ # This is a patch for parrot to update it to parrot-ns # # To apply this patch: # STEP 1: Chdir to the source directory. # STEP 2: Run the 'applypatch' program with this patch file as input. # # If you do not have 'applypatch', it is part of the 'makepatch' package # that you can fetch from the Comprehensive Perl Archive Network: # http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz # In the above URL, 'x' should be 2 or higher. # # To apply this patch without the use of 'applypatch': # STEP 1: Chdir to the source directory. # If you have a decent Bourne-type shell: # STEP 2: Run the shell with this file as input. # If you don't have such a shell, you may need to manually create/delete # the files/directories as shown below. # STEP 3: Run the 'patch' program with this file as input. # # These are the commands needed to create/delete files/directories: # mkdir 'chartypes' chmod 0755 'chartypes' mkdir 'encodings' chmod 0755 'encodings' rm -f 'transcode.c' rm -f 'strutf8.c' rm -f 'strutf32.c' rm -f 'strutf16.c' rm -f 'strnative.c' rm -f 'include/parrot/transcode.h' rm -f 'include/parrot/strutf8.h' rm -f 'include/parrot/strutf32.h' rm -f 'include/parrot/strutf16.h' rm -f 'include/parrot/strnative.h' touch 'chartype.c' chmod 0644 'chartype.c' touch 'chartypes/unicode.c' chmod 0644 'chartypes/unicode.c' touch 'chartypes/usascii.c' chmod 0644 'chartypes/usascii.c' touch 'encoding.c' chmod 0644 'encoding.c' touch 'encodings/singlebyte.c' chmod 0644 'encodings/singlebyte.c' touch 'encodings/utf16.c' chmod 0644 'encodings/utf16.c' touch 'encodings/utf32.c' chmod 0644 'encodings/utf32.c' touch 'encodings/utf8.c' chmod 0644 'encodings/utf8.c' touch 'include/parrot/chartype.h' chmod 0644 'include/parrot/chartype.h' touch 'include/parrot/encoding.h' chmod 0644 'include/parrot/encoding.h' # # This command terminates the shell and need not be executed manually. exit # End of Preamble Patch data follows diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST' Index: ./MANIFEST *** ./MANIFEST Wed Oct 24 22:16:51 2001 --- ./MANIFEST Sat Oct 27 14:59:43 2001 *** *** 1,5 --- 1,8 assemble.pl ChangeLog + chartype.c + chartypes/unicode.c + chartypes/usascii.c classes/genclass.pl classes/intclass.c config_h.in *** *** 14,19 --- 17,27 docs/parrotbyte.pod docs/strings.pod docs/vtables.pod + encoding.c + encodings/singlebyte.c + encodings/utf8.c + encodings/utf16.c + encodings/utf32.c examples/assembly/bsr.pasm examples/assembly/call.pasm examples/assembly/euclid.pasm *** *** 29,34 --- 37,44 global_setup.c hints/mswin32.pl hints/vms.pl + include/parrot/chartype.h + include/parrot/encoding.h include/parrot/events.h include/parrot/exceptions.h include/parrot/global_setup.h *** *** 45,55 include/parrot/runops_cores.h include/parrot/stacks.h include/parrot/string.h - include/parrot/strnative.h - include/parrot/strutf16.h - include/parrot/strutf32.h - include/parrot/strutf8.h - include/parrot/transcode.h include/parrot/trace.h include/parrot/unicode.h interpreter.c --- 55,60 *** *** 107,116 runops_cores.c stacks.c string.c - strnative.c - strutf16.c - strutf32.c - strutf8.c test_c.in test_main.c Test/More.pm --- 112,117 *** *** 128,134 t/op/time.t t/op/trans.t trace.c - transcode.c Types_pm.in vtable_h.pl vtable.tbl --- 129,134 diff -c
Re: String rationale
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Attached is my first pass at this - it's not fully ready yet but > is something for people to cast an eye over before I spend lots of > time going down the wrong path ;-) Before anybody else spots, let me just add what I forget to mention in my original post, which is that transcoding isn't implemented yet as I'm still thinking about the best way to do it. There is a hook in place ready for it though. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Opcode complaints
In message <[EMAIL PROTECTED]> "Brent Dax" <[EMAIL PROTECTED]> wrote: > 4. eq and friends: string variants > One thing that seems to be missing is string and numeric variants on the > comparison ops. While this isn't a problem now, it may be once we get > PMCs. Both string and numeric versions of the comparison ops exist... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote: > > >Attached is my first pass at this - it's not fully ready yet but > >is something for people to cast an eye over before I spend lots of > >time going down the wrong path ;-) > > It looks pretty good on first glance. I've done a bit more work now, and the latest version is attached. This version can do transcoding. The intention is that there will be some sort of cache in chartype_lookup_transcoder to avoid repeating the expensive lookups by name too much. One interesting question is who is responsible for transcoding from character set A to character set B - is it A or B? and how about the other way? My code currently allows either set to provide the transform on the grounds that otherwise the unicode module would have to either know how to convert to everything else or from everything else. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ # This is a patch for parrot to update it to parrot-ns # # To apply this patch: # STEP 1: Chdir to the source directory. # STEP 2: Run the 'applypatch' program with this patch file as input. # # If you do not have 'applypatch', it is part of the 'makepatch' package # that you can fetch from the Comprehensive Perl Archive Network: # http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz # In the above URL, 'x' should be 2 or higher. # # To apply this patch without the use of 'applypatch': # STEP 1: Chdir to the source directory. # If you have a decent Bourne-type shell: # STEP 2: Run the shell with this file as input. # If you don't have such a shell, you may need to manually create/delete # the files/directories as shown below. # STEP 3: Run the 'patch' program with this file as input. # # These are the commands needed to create/delete files/directories: # mkdir 'chartypes' chmod 0755 'chartypes' mkdir 'encodings' chmod 0755 'encodings' rm -f 'transcode.c' rm -f 'strutf8.c' rm -f 'strutf32.c' rm -f 'strutf16.c' rm -f 'strnative.c' rm -f 'include/parrot/transcode.h' rm -f 'include/parrot/strutf8.h' rm -f 'include/parrot/strutf32.h' rm -f 'include/parrot/strutf16.h' rm -f 'include/parrot/strnative.h' touch 'chartype.c' chmod 0644 'chartype.c' touch 'chartypes/unicode.c' chmod 0644 'chartypes/unicode.c' touch 'chartypes/usascii.c' chmod 0644 'chartypes/usascii.c' touch 'encoding.c' chmod 0644 'encoding.c' touch 'encodings/singlebyte.c' chmod 0644 'encodings/singlebyte.c' touch 'encodings/utf16.c' chmod 0644 'encodings/utf16.c' touch 'encodings/utf32.c' chmod 0644 'encodings/utf32.c' touch 'encodings/utf8.c' chmod 0644 'encodings/utf8.c' touch 'include/parrot/chartype.h' chmod 0644 'include/parrot/chartype.h' touch 'include/parrot/encoding.h' chmod 0644 'include/parrot/encoding.h' # # This command terminates the shell and need not be executed manually. exit # End of Preamble Patch data follows diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST' Index: ./MANIFEST *** ./MANIFEST Sun Oct 28 17:11:21 2001 --- ./MANIFEST Sun Oct 28 17:11:07 2001 *** *** 1,5 --- 1,8 assemble.pl ChangeLog + chartype.c + chartypes/unicode.c + chartypes/usascii.c classes/genclass.pl classes/intclass.c classes/scalarclass.c *** *** 15,20 --- 18,28 docs/parrotbyte.pod docs/strings.pod docs/vtables.pod + encoding.c + encodings/singlebyte.c + encodings/utf8.c + encodings/utf16.c + encodings/utf32.c examples/assembly/bsr.pasm examples/assembly/call.pasm examples/assembly/euclid.pasm *** *** 30,35 --- 38,45 global_setup.c hints/mswin32.pl hints/vms.pl + include/parrot/chartype.h + include/parrot/encoding.h include/parrot/events.h include/parrot/exceptions.h include/parrot/global_setup.h *** *** 46,56 include/parrot/runops_cores.h include/parrot/stacks.h include/parrot/string.h - include/parrot/strnative.h - include/parrot/strutf16.h - include/parrot/strutf32.h - include/parrot/strutf8.h - include/parrot/transcode.h include/parrot/trace.h include/parrot/unicode.h interpreter.c --- 56,61 *** *** 108,117 runops_cores.c stacks.c string.c - strnative.c - strutf16.c - strutf32.c - strutf8.c test_c.in test_main.c Test/More.pm --- 113,118 *** *** 129,135 t/op/time.t t/op/trans.t trace.c - transcode.c Types_pm.in vtable_h.pl vtable.tbl --- 130,135 diff -c &
RE: String rationale
In message <[EMAIL PROTECTED]> "Stephen Howard" <[EMAIL PROTECTED]> wrote: > right. I had just keyed in on this from Tom's message: > > "My code currently allows either set to provide the transform on the > grounds that otherwise the unicode module would have to either know > how to convert to everything else or from everything else." > > ...which seemed to posit that Unicode module could be responsible for > all the transcodings to and from it's own character set, which seemed > backwards to me. I was only positing it long enough to acknowledge that such a rule was untenable. What it comes down to is that there are three possibles rules, namely: 1. Each character set defines transforms from itself to other character sets. 2. Each character set defines transforms to itself from other character sets. 3. Each character set defines transforms both from itself to other character sets and from other character sets to itself. We have established that the first two will not work because of the unicode problem. That leaves the third, which is what I have implemented. When looking to transcode from A to B it will first ask A if can it transcode to B and if that fails then it will ask B if it can transcode from A. That way each character set can manage it's own translations both to and from unicode as we require. The problem it raises is, whois reponsible for transcoding from ASCII to Latin-1? and back again? If we're not careful both ends will implement both translations and we will have effective duplication. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > > That leaves the third, which is what I have implemented. When looking to > > transcode from A to B it will first ask A if can it transcode to B and > > if that fails then it will ask B if it can transcode from A. > I propose another variant on this: > If that fails, it asks A to transcode to Unicode, and B to transcode from > Unicode. (Not Unicode to transcode to B; Unicode implements no transcodings.) My code does that, though at a slightly higher level. If you look at string_transcode() you will see that if it can't find a direct mapping it will go via unicode. If C had closures then I'd have buried that down in the chartype_lookup_transcoder() layer, but it doesn't so I couldn't ;-) > > The problem it raises is, whois reponsible for transcoding from ASCII to > > Latin-1? and back again? If we're not careful both ends will implement > > both translations and we will have effective duplication. > 1) Neither. Each must support transcoding to and from Unicode. Absolutely. > 2) But either can support converting directly if it wants. The danger is that everybody tries to be clever and support direct conversion to and from as many other character sets as possible, which leads to lots of duplication. > I also think that, for efficency, we might want a "7-bit chars match ASCII" > flag, since most charactersets do, and that means that we don't have to deal > with the overhead for strings that fit in 7 bits. This smells of premature > optimization, though, so sombody just file this away in their heads for > future reference. I have already been thinking about this although it does get more complicated as you have to consider the encoding as well - if you have a single byte encoded ASCII string then transcoding to a single byte encoded Latin-1 string is a no-op, but that may not be true for other encodings if such a thing makes sense for those character types. > (BTW, for those paying attention, I'm waiting on this discussion for my > chr/ord patch, since I want them in terms of charsets, not encodings.) I suspect that the encode and decode methods in the encoding vtable are enough for doing chr/ord aren't they? Surely chr() is just encoding the argument in the chosen encoding (which can be the default encoding for the char type if you want) and then setting the type and encoding of the resulting string appropriately. Equally ord() is decoding the first character of the string to get a number. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > On Mon, Oct 29, 2001 at 11:20:47PM +0000, Tom Hughes wrote: > > > I suspect that the encode and decode methods in the encoding vtable > > are enough for doing chr/ord aren't they? > > Hmm... come to think of it, yes. chr will always create a utf32-encoded > string with the given charset number (or unicode for the two-arg version), > ord will return the codepoint within the current charset. I hope it will create a string with the given charset number and using the default encoding for that charset. Asking for an ASCII character and getting it UTF-32 encoded would be more that a little bizarre. If I say chr(65,ASCII) then I would expect to get a single byte encoded string... > (This, BTW, means that only encodings that feel like it have to provide > either, but all encodings must be able to convert to utf32.) The way I've written it, any encoding can convert to any encoding at all, because there is no conversion at that level. I just decode a character from the source, transcode it at the character level, and then encode it to the destination. If an encoding cannot handle the full range of character values for a character set then you will get an exception when it tries to encode an out of range character. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String rationale
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > In message <[EMAIL PROTECTED]> > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > > At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote: > > > > >Attached is my first pass at this - it's not fully ready yet but > > >is something for people to cast an eye over before I spend lots of > > >time going down the wrong path ;-) > > > > It looks pretty good on first glance. > > I've done a bit more work now, and the latest version is attached. Unless anybody has objections I plan to commit this work shortly... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Sat, Oct 27, 2001 at 04:23:48PM +0100, Tom Hughes wrote: > > The encoding_lookup() and chartype_lookup() routines will obviously > > need to load the relevant libraries on the fly when we have support > > for that. > > Could you try rewriting them using an enum, like the vtable stuff and > the original string encoding stuff does? The intention is that when an encoding or character type is loaded it will be allocated a unique ID number that can be used internally to refer to it, but that the number will only valid for the duration of that instance of parrot rather than being persistent. That's certainly the way Dan described it happening in his rationale which is what my code is based on. Allocating them globally is not possible if we're going allow people to add arbitrary encodings and character sets - as things stand adding the foo encoding will be as simple as adding foo.so to the encodings directory. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String rationale
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > As things stand, that won't work, because you're doing a string lookup in one > of the core functions, and you still need some way of registering incoming > stuff. With an enum, you can keep hold of a fake encoding_max, and hand > encoding_max++ to the initialisation function for each encoding. Well there won't be any point in it being an enum rather that an integer unless some of them are going to be preallocated. I'm not sure if the encoding and character types will need to know their own index numbers but if we do then they can be told at initialisation time, yes. I absolutely intend that the current hard coded strings in the core will go away in due course though. When you look up an encoding or character type by name it will first check a hash table or something to see if it is already loaded and if not it will look for it on disk and load it in, allocate it a number, and add it to the hash table for future reference. Hence the current strcmp junk in the lookup functions will go away. In much the same way the byte code will have some sort of table of names which it will look up as it is loaded rather than the current hard coding of name to number mappings in the byte code. So all I need now to make all this work is hash tables and dynamic code loading ;-) Any volunteers... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [PATCH] Computed goto, super-fast dispatching.
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > All: > Here's a list of the things I've been doing: > > * Added ops2cgc.pl which generates core_cg_ops.c and core_cg_ops.h from > core.ops, and modified Makefile.in to use it. In core_cg_ops.c resides > cg_core which has an array with the addresses of the label of each opcode > and starts the execution "jumping" to the address in array[*cur_opcode]. > > * Modified interpreter.c to include core_cg_ops.h > > * Modified runcore_ops.c to discard the actual dispatching method and call > cg_core, but left everything else untouched so that -b,-p and -t keep > working. > > * Modified pbc2c.pl to use computed goto when handling jump or ret, may be > I can modified this once again not to define the array with the addresses > if it's not going to be used but I don't think that in real life a program > won't use jump or ret, am I right? > > Hope some one find this usefull. I just tried it but I don't seem to be seeing anything like the speedups you are. All the times which follow are for a K6-200 running RedHat 7.2 and compiled -O6 with gcc 2.96. Without patch: gosford [~/src/parrot] % ./test_prog examples/assembly/mops.pbc Iterations:1 Estimated ops: 3 Elapsed time: 37.387179 M op/s:8.024141 gosford [~/src/parrot] % ./examples/assembly/mops Iterations:1 Estimated ops: 3 Elapsed time: 3.503482 M op/s:85.629098 With patch: gosford [~/src/parrot-cg] % ./test_prog examples/assembly/mops.pbc Iterations:1 Estimated ops: 3 Elapsed time: 29.850361 M op/s:10.050130 gosford [~/src/parrot-cg] % ./examples/assembly/mops Iterations:1 Estimated ops: 3 Elapsed time: 4.515596 M op/s:66.436413 So there is a small speed up for the interpreted version, but nothing like the three times speedup you had. The compiled version has actually managed to get slower... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Computed goto, super-fast dispatching.
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > Yeap, I was right, using gcc 3.0.2 you can see the difference: I've just tried it with 3.0.1 and see much the same results as I did with 2.96 I'm afraid. I don't have 3.0.2 to hand without building it from source so I haven't tried that as yet. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Computed goto, super-fast dispatching.
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > Do you want me to give you an account in my linux machine where I have > install gcc 3.0.2 so that you see it? I'm not sure that will achieve anything - it's not that I don't believe you, it's just that I'm not seeing the same thing. I have now tried on a number of other machines, and the results are summarised in the following table: Standard Computed Gotos Interpreted CompiledInterpreted Compiled A 3.3533.56 4.63 (+38%) 29.83 (-11%) B 5.6985.2414.08 (+147%) 78.60 (-8%) C 15.09 314.9131.83 (+111%)259.34 (-18%) D 45.87 774.7362.37 (+36%) 795.30 (+3%) Machine A is a 90Mhz Pentium running RedHat 7.1 with gcc 2.96 Machine B is a Dual 200Mhz Pentium-Pro running RedHat 6.1 with egcs 1.1.2 Machine C is a 733Mhz Pentium III running FreeBSD 4.3-STABLE with gcc 2.95.3 Machine D is an 1333Mhz Athlon running RedHat 7.1 with gcc 2.96 Clearly the speedup varies significantly between systems with some giving much greater improvements than others. One other thing that I did notice is that there is quite a bit of fluctuation between runs on some of the machines, possibly because we are measuring real time and not CPU time. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [PATCHES] concat, read, substr, added 'ord' operator, and a SURPRISE
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 03:35 AM 11/11/2001 -0500, James Mastros wrote: > > >No, it isn't. I'm not sure s->strlen is always gaurnteed to be correct; > >string_length(s) is. (I found a case where it was wrong when coding my > >version of ord() once, though that ended up being a problem with my > >version of chr(). The point is that string_length is an API, but the > >contents of the struct are not.) > > We shouldn't cheat--the string length field should be considered a black > box until we need the speed, at which point we play Macro Games and change > string_length into a direct fetch. As far as I know the strlen member should always be correct. I was certainly trying to make sure it was because strings.pod explictly says that it will be and that it can be used directly instead of calling string_length(). Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCHES] ord(i,s|sc(,i|ic)?) operator committed, fixed bug in concat()
In message <[EMAIL PROTECTED]> Jeff <[EMAIL PROTECTED]> wrote: > string.c - Added string_ord() and a _string_index() helper function to > help making accommodating different encodings easier. Patched concat() > to deal with null strings. I have just committed an amendment to this to make string_index use the encoding routines instead of assuming a single byte encoding. I have also renamed _string_index to string_index as function names that start with an underscore are reserved to implementors by the C standard. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Butt-ugliness reduction
In message <[EMAIL PROTECTED]> Michael L Maraist <[EMAIL PROTECTED]> wrote: > inlined c-functions.. Hmm, gcc has some support for this, but what about > other archectures.. For function-inlining to work with GCC, you have to > define the function in the header.. That's definately not portable. I guess > you're saying that the inlined functions would be the same .c file as it's > being used.. Well, I thought these classes might span multiple files, making > that rather difficult. You only need to define it in the header if it needs to be visible across more than one file - if it is only needed in the file that is implrmenting the scalar class then it can be put there. In fact many compilers will inline small static functions anyway even without an explicit hint in the source. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Moving NV constants to the constant table
In message <[EMAIL PROTECTED]> "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote: > Let me know how this works for you... There seems to be a lot of the patch missing: gosford [~/src/parrot-nvconst] % patch -N < /tmp/nvconst.patch patching file Makefile.in patching file Types_pm.in patching file assemble.pl patch: unexpected end of file in patch Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] (AGAIN) NV constants in constant table
In message <[EMAIL PROTECTED]> "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote: > There was trouble with the attachment on my last post, so here it > comes again... That patches and builds OK but the added files are not in the patch so Parrot/Assembler.pm at least is missing and this I can't run any tests. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] (AGAIN) NV constants in constant table
In message <[EMAIL PROTECTED]> "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote: > Sorry about that, Tom. I really need to add -N to my .cvsrc... > I just sent the (hopefully) complete patch to the list. Please try > it out against a fresh checkout and let me know how it works for > you... It builds and tests cleanly for me now (linux/x86). Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: [PATCH] non-init var possibility
In message <[EMAIL PROTECTED]> Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote: > No, the behavior of malloc(0) is implementation defined. It is, yes, but there are only two legal results according to the ISO C standard: "If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. In other words it can't crash or do anything else undesirable, and the result will always be something that can't be dereferenced, but can be freed (given that the standard requires free(NULL) to work). Given that, although we can't say the behaviour is strictly speaking consistent it is true that as far as performing normal operations on the pointer go you are unlikely to notice which behaviour a given platform has chosen. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Transcoding patch
The attached patch is a first stab at implementing string transcoding and the unicode string types. The transcoder will currently only map one UTF type to another - there is no attempt to implement mapping to or from native strings as I wasn't sure what the plan was for that. Presumably we will have to determine what the native character set is at configure time and then generate some code to map between that and unicode somehow? There are currently no proper tests because there is no way to generate anything other than a native string using the current assembler. There is a small C test harness (trans-test.c) which I have used to validate the transcoder to a certain extent. This patch also fixes a bug in the existing native strings where string_native_compute_strlen was returning the number of bytes that had been allocated rather than the number that were in use. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ diff -urNw --exclude CVS parrot/Makefile.in parrot-transcode/Makefile.in --- parrot/Makefile.in Sun Oct 7 15:58:56 2001 +++ parrot-transcode/Makefile.inSun Oct 7 16:08:49 2001 @@ -4,7 +4,7 @@ INC=include/parrot H_FILES = $(INC)/config.h $(INC)/exceptions.h $(INC)/io.h $(INC)/op.h $(INC)/register.h $(INC)/string.h $(INC)/events.h $(INC)/interpreter.h $(INC)/memory.h $(INC)/parrot.h $(INC)/stacks.h $(INC)/packfile.h $(INC)/global_setup.h $(INC)/vtable.h -O_FILES = global_setup$(O) interpreter$(O) parrot$(O) register$(O) basic_opcodes$(O) memory$(O) packfile$(O) string$(O) strnative$(O) +O_FILES = global_setup$(O) interpreter$(O) parrot$(O) register$(O) basic_opcodes$(O) +memory$(O) packfile$(O) string$(O) strnative$(O) strutf8$(O) strutf16$(O) +strutf32$(O) transcode$(O) #DO NOT ADD C COMPILER FLAGS HERE #Add them in Configure.pl--look for the @@ -32,8 +32,8 @@ $(TEST_PROG): test_main$(O) $(O_FILES) interp_guts$(O) op_info$(O) $(CC) $(CFLAGS) -o $(TEST_PROG) $(O_FILES) interp_guts$(O) op_info$(O) test_main$(O) $(C_LIBS) -$(PDUMP): pdump$(O) packfile$(O) memory$(O) global_setup$(O) string$(O) strnative$(O) - $(CC) $(CFLAGS) -o $(PDUMP) pdump$(O) packfile$(O) memory$(O) global_setup$(O) string$(O) strnative$(O) $(C_LIBS) +$(PDUMP): pdump$(O) packfile$(O) memory$(O) global_setup$(O) string$(O) strnative$(O) +strutf8$(O) strutf16$(O) strutf32$(O) transcode$(O) + $(CC) $(CFLAGS) -o $(PDUMP) pdump$(O) packfile$(O) memory$(O) global_setup$(O) +string$(O) strnative$(O) strutf8$(O) strutf16$(O) strutf32$(O) transcode$(O) $(C_LIBS) test_main$(O): $(H_FILES) $(INC)/interp_guts.h @@ -42,6 +42,14 @@ string$(O): $(H_FILES) strnative$(O): $(H_FILES) + +strutf8$(O): $(H_FILES) + +strutf16$(O): $(H_FILES) + +strutf32$(O): $(H_FILES) + +transcode$(O): $(H_FILES) $(INC)/interp_guts.h interp_guts.c $(INC)/op_info.h op_info.c: opcode_table build_interp_starter.pl $(PERL) build_interp_starter.pl diff -urNw --exclude CVS parrot/global_setup.c parrot-transcode/global_setup.c --- parrot/global_setup.c Sun Sep 16 12:32:21 2001 +++ parrot-transcode/global_setup.c Sat Oct 6 15:43:20 2001 @@ -17,6 +17,7 @@ void init_world() { string_init(); /* Set up the string subsystem */ +transcode_init(); /* Set up the transcoding subsystem */ } /* diff -urNw --exclude CVS parrot/include/parrot/exceptions.h parrot-transcode/include/parrot/exceptions.h --- parrot/include/parrot/exceptions.h Mon Sep 24 22:40:32 2001 +++ parrot-transcode/include/parrot/exceptions.hSun Oct 7 15:36:46 2001 @@ -17,6 +17,9 @@ #define NO_REG_FRAMES 1 #define SUBSTR_OUT_OF_STRING 1 +#define MALFORMED_UTF8 1 +#define MALFORMED_UTF16 1 +#define MALFORMED_UTF32 1 #endif diff -urNw --exclude CVS parrot/include/parrot/parrot.h parrot-transcode/include/parrot/parrot.h --- parrot/include/parrot/parrot.h Sat Oct 6 15:10:50 2001 +++ parrot-transcode/include/parrot/parrot.hSun Oct 7 15:21:57 2001 @@ -66,6 +66,7 @@ #include "parrot/global_setup.h" #include "parrot/string.h" +#include "parrot/transcode.h" #include "parrot/vtable.h" #include "parrot/interpreter.h" #include "parrot/register.h" diff -urNw --exclude CVS parrot/include/parrot/string.h parrot-transcode/include/parrot/string.h --- parrot/include/parrot/string.h Tue Oct 2 22:02:00 2001 +++ parrot-transcode/include/parrot/string.hSun Oct 7 15:21:46 2001 @@ -85,6 +85,9 @@ VAR_SCOPE STRING_VTABLE Parrot_string_vtable[enc_max]; #include "parrot/strnative.h" +#include "parrot/strutf8.h" +#include "parrot/strutf16.h" +#include "parrot/strutf32.h" #endif /* diff -urNw --exclude CVS parrot/include/parrot/strutf16.h parrot-transcode/include/parrot/strutf16.h --- parrot/include/parrot/strutf16.hThu Jan 1 01:00:00 1970 +++ parrot-transcode/include/parrot/strutf16.h Sun Oct 7 15:21:02 2001 @@ -0,0 +1,29 @@ +/* strutf16.h + * Copyri
RE: Transcoding patch
In message <[EMAIL PROTECTED]> Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote: > This is good, unless someone has objections I'll commit this. However, we > also need the ability to do unicode in the assembler (I'll do this later > today if no one beats me to it), and we need some way to communicate the > encoding number between the C and the Perl code. It probably does still need some cleaning up but that can be done incremently. One of the main things that I wasn't sure about but forgot to mention in the original message is what we want to do about malformed strings. Are we going to assume strings are well formed and go hell for leather in handling them or do we want to move to the paranoid end of the spectrum and check everything we do and throw exceptions when something odd is spotted? Currently the code does a bit of both - sometimes it checks things and sometimes it doesn't. > I guess the question with native strings is will it always be ASCII or will > it be Shift-JIS etc...? And the follow up to that is can, for the short > term, we assume it will be ASCII and then improve our native string > transcoding over time? Well according to string.pod native will always be a single byte per character encoding and never a wide character or shifted encoding so that rules out Shift-JIS and most other far eastern encodings. BTW the claim in string.pod that UTF-8 needs a maximum of 3 bytes per character is wrong, at least if you allow U+ to U+10 as your character space which is what I did - any character over U+ needs four bytes. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Transcoding patch
In message <[EMAIL PROTECTED]> Gibbs Tanton <[EMAIL PROTECTED]> wrote: > I've applied this patch. I just did an update and noticed the new files had appeared about two seconds before your mail arrived ;-) > I realize that we have a ways to go before we can fully support unicode, but > I felt that this patch was a big step in the right direction; with it > committed we can now start incrementally cleaning it up and making it work > correctly. Since it doesn't affect anything we are working on it shouldn't > get in the way at all. Absolutely. A few other issues that I remembered last night are: - The current code assumes that the string data will be two byte aligned for UTF-16 and four byte aligned for UTF-32 which is probably reasonable but maybe not. - The utf8_t, utf16_t and utf32_t types will need to be determined by configure as they will currently break on some machines. Plus machines without native 8, 16 and 32 bit types will be a problem. - There are byte ordering issues for UTF-16 and UTF-32 strings. The current code assumes host byte ordering but should we be spotting byte order markers in the strings and adjusting to cope? > We do need to figure out how to change from unicode to native. We also need > to make sure that we don't hardcode the encoding in the assembler, the > assembler should be able to get what encoding to use from a file. A fundamental question (which I think Simon was hinting at with his cryptic comment) is whether the native encoding is fixed when parrot is built or can change on the fly as they user changes their locale settings. If it's the latter than conversion to and from native will have to work by loading an appropriate conversion table at run time. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Transcoding patch
In message <[EMAIL PROTECTED]> Gibbs Tanton <[EMAIL PROTECTED]> wrote: > > - The utf8_t, utf16_t and utf32_t types will need to be determined > >by configure as they will currently break on some machines. Plus > >machines without native 8, 16 and 32 bit types will be a problem. > > Almost all hardware should have char as an 8 bit type so that shouldn't be a > problem. However, finding a 16 bit or 32 bit type might be a problem on > some hardware. We might want to think about using arrays of 8 bit types or > using bit fields. The Cray was the canonical example of a problem machine that I had in mind - if I recall correctly even char is 8 bytes there isn't it? Bit fields are no use as you can't have a pointer to a bit field. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
RE: Transcoding patch
In message <[EMAIL PROTECTED]> Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote: > This is good, unless someone has objections I'll commit this. However, we > also need the ability to do unicode in the assembler (I'll do this later > today if no one beats me to it), and we need some way to communicate the > encoding number between the C and the Perl code. The attached patch solves the assembler issue by allowing quoted strings to be prefixed with U8, U16 or U32 to indicate a unicode string of the appropriate type, so: set_s_sc S1, U8"Hello World" creates a UTF-8 string in S1 containg the specified data. I don't particularly like that syntax so if anybody has any better ideas then please say... Most of the patch is useful whatever the syntax though - it will just need tweaking to recognise the appropriate syntax. The patch also adds support for \x escapes in strings as it is difficult to write unicode string constants without that. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: Assembler.pm === RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v retrieving revision 1.7 diff -u -w -r1.7 Assembler.pm --- Assembler.pm2001/10/06 05:21:16 1.7 +++ Assembler.pm2001/10/08 23:46:07 @@ -270,6 +270,17 @@ '__LINE__' => sub { return $line }, '__FILE__' => sub { return "\"$file\"" }); + +### + +=head2 %encodings + +maps string prefixes to encodings. + +=cut + +my %encodings=('' => 0, 'U8' => 1, 'U16' => 2, 'U32' => 3); + my %opcodes = Parrot::Opcode::read_ops( -f "../opcode_table" ? "../opcode_table" : "opcode_table" ); @@ -487,7 +498,7 @@ # now emit each constant my $counter = 0; for( @constants ) { -my ($type, $value) = @$_; +my ($type, $value, $encoding) = @$_; add_line_to_listing( sprintf( "\t%04x %s [[%s]]\n", $counter, $type, $value ) ); $counter++; @@ -497,7 +508,7 @@ } elsif ($type eq 'n') { $const_table->add(Parrot::PackFile::Constant->new_number($value)); } elsif ($type eq 's') { - $const_table->add(Parrot::PackFile::Constant->new_string(0, 0, 0, length($value), $value)); + $const_table->add(Parrot::PackFile::Constant->new_string(0, $encoding, 0, +length($value), $value)); } else { die; # TODO: Be more specific } @@ -651,7 +662,7 @@ sub replace_string_constants { my $code = shift; - $code =~ s/\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($1)/eg; + $code =~ +s/(U(?:8|16|32))?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg; return $code; } @@ -1283,14 +1294,17 @@ sub constantize_string { my $s = shift; +my $p = shift || ""; +my $e = $encodings{$p}; # handle \ characters in the constant my %escape = ('a'=>"\a",'n'=>"\n",'r'=>"\r",'t'=>"\t",'\\'=>'\\',); $s=~s/\\([anrt\\])/$escape{$1}/g; -if(!exists($constants{$s}{s})) { - push(@constants, ['s', $s]); - $constants{$s}{s}=$#constants; +$s=~s/\\x([0-9a-fA-F]{1,2})/chr(hex($1))/ge; +if(!exists($constants{$s}{s}{$e})) { + push(@constants, ['s', $s, $e]); + $constants{$s}{s}{$e}=$#constants; } -return "[sc:".$constants{$s}{s}."]"; +return "[sc:".$constants{$s}{s}{$e}."]"; }
RE: Transcoding patch
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 07:03 PM 10/8/2001 -0500, Gibbs Tanton - tgibbs wrote: > >This looks good. > > > >Also, WRT the utf8_t, utf16_t, and utf32_t can we not just use utf32_t and > >then mask off the lower 8 or 16 bits? We can still have utf8_t be defined > >as char to allow sizeof to work right and we can do sizeof(utf8_t)*2 to get > >the utf16_t's size. > > utf8 and utf16 are both variable length encodings for space reasons. > There's not much reason to space-compact something then expand the heck out > of it. I think he was just referring to the internal type used to hold a character during processing, not to expanding the whole string. > On the other hand, I'd really, *really* rather not have Unicode > constants in anything other than UTF-32, so I'd as soon we chopped out the > utf-8 and utf-16 constant support from this. > > A should be the prefix for US-ASCII characters. > U should be the prefix for Unicode characters > N should be the prefix for the native character set (and the default) > > Beyond that I'm not sure what, if anything, we should accommodate in the > assembler. What does US-ASCII correspond to internally - we don't have an encoding for that. unless you're planning to mark it as UTF-8 and rely on US-ASCII being a subset of UTF-8 of course ;-) The only oter thing is that writing tests for UTF-8 and UTF-16 strings and the transcoder is going to be quite tricky if we can't generate them using the assembler. Other than that I'll sort out a patch for this later today. Moving on, my next target is to get string comparison working. That's not too difficult until you have to compare strings whose encodings are different - comparing two unicode strings is OK as we can always transcode the second to the same type as the first, but if we're comparing a native string with a unicode string we will have to do a transcode from native to unicode even if the native string is first, so the transcoding will have to be done at the string layer rather than the strnative/strutfn layers I think. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: Transcoding patch
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > utf8 and utf16 are both variable length encodings for space reasons. > There's not much reason to space-compact something then expand the heck out > of it. On the other hand, I'd really, *really* rather not have Unicode > constants in anything other than UTF-32, so I'd as soon we chopped out the > utf-8 and utf-16 constant support from this. > > A should be the prefix for US-ASCII characters. > U should be the prefix for Unicode characters > N should be the prefix for the native character set (and the default) > > Beyond that I'm not sure what, if anything, we should accommodate in the > assembler. Attached is a patch to drop the U8, U16 and U32 prefixes and add U and N prefixes. I havn't added the A prefix because I'm still not clear what encoding those are supposed to map to. I can understand the following mappings: N => enc_native U => enc_utf32 but what is A supposed to map to exactly? or is the assembler supposed to mangle an A string into an N or U string and then put it in the bytecode in one of those formats? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: Assembler.pm === RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v retrieving revision 1.8 diff -u -w -r1.8 Assembler.pm --- Assembler.pm2001/10/09 02:45:36 1.8 +++ Assembler.pm2001/10/09 21:25:28 @@ -279,7 +279,7 @@ =cut -my %encodings=('' => 0, 'U8' => 1, 'U16' => 2, 'U32' => 3); +my %encodings=('' => 0, 'N' => 0, 'U' => 3); my %opcodes = Parrot::Opcode::read_ops( -f "../opcode_table" ? "../opcode_table" : "opcode_table" ); @@ -662,7 +662,7 @@ sub replace_string_constants { my $code = shift; - $code =~ s/(U(?:8|16|32))?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg; + $code =~ s/([NU])?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg; return $code; }
String comparison ops
Attached is a patch to add string comparison ops, along with the necessary infrastructure in the string code. The current behaviour is that if the two strings do not have the same encoding then both are promoted to UTF-32 before comparison as that should generally preserve information. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ ? t/tom.pasm Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.36 diff -u -w -r1.36 basic_opcodes.ops --- basic_opcodes.ops 2001/10/08 14:04:20 1.36 +++ basic_opcodes.ops 2001/10/09 23:46:56 @@ -604,6 +604,90 @@ AUTO_OP concat_s { STRING *s = string_concat(STR_REG(P1), STR_REG(P2), 1); STR_REG(P1) = s; +} + +/* EQ Sx, Sy, EQ_BRANCH */ +MANUAL_OP eq_s_ic { + if (string_compare(STR_REG(P1), STR_REG(P2)) == 0) { +RETURN(INT_CONST(P3)); + } +} + +/* EQ Sx, CONSTANT, EQ_BRANCH */ +MANUAL_OP eq_sc_ic { + if (string_compare(STR_REG(P1), STR_CONST(P2)) == 0) { +RETURN(INT_CONST(P3)); + } +} + +/* NE Sx, Sy, NE_BRANCH */ +MANUAL_OP ne_s_ic { + if (string_compare(STR_REG(P1), STR_REG(P2)) != 0) { +RETURN(INT_CONST(P3)); + } +} + +/* NE Sx, CONSTANT, NE_BRANCH */ +MANUAL_OP ne_sc_ic { + if (string_compare(STR_REG(P1), STR_CONST(P2)) != 0) { +RETURN(INT_CONST(P3)); + } +} + +/* LT Sx, Sy, LT_BRANCH */ +MANUAL_OP lt_s_ic { + if (string_compare(STR_REG(P1), STR_REG(P2)) < 0) { +RETURN(INT_CONST(P3)); + } +} + +/* LT Sx, CONSTANT, LT_BRANCH */ +MANUAL_OP lt_sc_ic { + if (string_compare(STR_REG(P1), STR_CONST(P2)) < 0) { +RETURN(INT_CONST(P3)); + } +} + +/* LE Sx, Sy, LE_BRANCH */ +MANUAL_OP le_s_ic { + if (string_compare(STR_REG(P1), STR_REG(P2)) <= 0) { +RETURN(INT_CONST(P3)); + } +} + +/* LE Sx, CONSTANT, LE_BRANCH */ +MANUAL_OP le_sc_ic { + if (string_compare(STR_REG(P1), STR_CONST(P2)) <= 0) { +RETURN(INT_CONST(P3)); + } +} + +/* GT Sx, Sy, GT_BRANCH */ +MANUAL_OP gt_s_ic { + if (string_compare(STR_REG(P1), STR_REG(P2)) > 0) { +RETURN(INT_CONST(P3)); + } +} + +/* GT Sx, CONSTANT, GT_BRANCH */ +MANUAL_OP gt_sc_ic { + if (string_compare(STR_REG(P1), STR_CONST(P2)) > 0) { +RETURN(INT_CONST(P3)); + } +} + +/* GE Sx, Sy, GE_BRANCH */ +MANUAL_OP ge_s_ic { + if (string_compare(STR_REG(P1), STR_REG(P2)) >= 0) { +RETURN(INT_CONST(P3)); + } +} + +/* GE Sx, CONSTANT, GE_BRANCH */ +MANUAL_OP ge_sc_ic { + if (string_compare(STR_REG(P1), STR_CONST(P2)) >= 0) { +RETURN(INT_CONST(P3)); + } } /* NOOP */ Index: opcode_table === RCS file: /home/perlcvs/parrot/opcode_table,v retrieving revision 1.24 diff -u -w -r1.24 opcode_table --- opcode_table2001/10/08 13:45:21 1.24 +++ opcode_table2001/10/09 23:46:57 @@ -67,7 +67,7 @@ substr_s_s_i 4 S S I I concat_s 2 S S -# Comparators (TODO: String comparators) +# Comparators eq_i_ic3 I I D eq_ic_ic 3 I i D @@ -94,6 +94,19 @@ gt_nc_ic 3 N n D ge_n_ic3 N N D ge_nc_ic 3 N n D + +eq_s_ic3 S S D +eq_sc_ic 3 S s D +ne_s_ic3 S S D +ne_sc_ic 3 S s D +lt_s_ic3 S S D +lt_sc_ic 3 S s D +le_s_ic3 S S D +le_sc_ic 3 S s D +gt_s_ic3 S S D +gt_sc_ic 3 S s D +ge_s_ic3 S S D +ge_sc_ic 3 S s D # Flow control Index: string.c === RCS file: /home/perlcvs/parrot/string.c,v retrieving revision 1.12 diff -u -w -r1.12 string.c --- string.c2001/10/08 07:49:10 1.12 +++ string.c2001/10/09 23:46:57 @@ -152,6 +152,23 @@ return (ENC_VTABLE(s)->chopn)(s, n); } +/*=for api string string_compare + * compare two strings. + */ +INTVAL +string_compare(STRING* s1, STRING* s2) { +if (s1->encoding != s2->encoding) { +if (s1->encoding->which != enc_utf32) { +s1 = Parrot_transcode_table[s1->encoding->which][enc_utf32](s1, NULL); +} +if (s2->encoding->which != enc_utf32) { +s2 = Parrot_transcode_table[s2->encoding->which][enc_utf32](s2, NULL); +} +} + +return (ENC_VTABLE(s1)->compare)(s1, s2); +} + /* * Local variables: * c-indentation-style: bsd Index: strnative.c === RCS file: /home/perlcvs/parrot/strnative.c,v retrieving revision 1.15 diff -u -w -r1.15 strnative.c --- strnative.c 2001/10/08 07:49:10 1.15 +++ strnative.c 2001/10/09 23:46:58 @@ -82,6 +82,25 @@ return dest; } +/*=for api string_native string_native_compare + compare two strings +*/ +static INTVAL +string_native_compare(STRING* s1, STRING* s2) { +INTVAL cmp; + +if (s1->bufused < s
Re: String comparison ops
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Wed, Oct 10, 2001 at 12:49:50AM +0100, Tom Hughes wrote: > > Attached is a patch to add string comparison ops, along with the > > necessary infrastructure in the string code. > > I see no tests *or* documentation. Come on, Tom, you should know > better than that. :) Tests are next on my list... One reason for writing the comparison stuff was to make writing tests for the transcoder etc possible. I'll sort out a documentation patch in a momemnt. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: String comparison ops
In message <[EMAIL PROTECTED]> Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote: > Does the call to the transcode function create a new string or change the > string in place. I don't think we want to pass in a native string only to > find out it is unicode after we get done comparing it. It creates a new string if the second argument is null, and overwrites the second argument otherwise, so in this case it will create a new string. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String comparison ops
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > I see no tests *or* documentation. Come on, Tom, you should know > better than that. :) Here's the doc patch: Index: strings.pod === RCS file: /home/perlcvs/parrot/docs/strings.pod,v retrieving revision 1.4 diff -u -w -r1.4 strings.pod --- strings.pod 2001/10/02 14:01:31 1.4 +++ strings.pod 2001/10/10 07:55:40 @@ -89,6 +89,17 @@ C<*dest> is a null pointer, a new string structure is created with the same encoding as C.) +To compare two strings, use: + +INTVAL string_compare(STRING* s1, STRING* s2) + +The value returned will be less than, equal to, or greater than zero +depending on whether C is less than, equal to, or greater than C. + +Strings whose encodings are not the same can be compared - in this +case a UTF-32 copy will be made of each string and these copies will +be compared. + B: To format output into a string, use Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String comparison ops
In message <00b001c15166$a3b88ee0$7f03ef12@MLAMBERT> Michel Lambert <[EMAIL PROTECTED]> wrote: > Am I missing something here, or does this code not properly free transcoded > s1's and s2's after it's done comparing them? You're quite right that it doesn't, but neither does anything else that creates temporary strings in a different encoding ;-) As we're using garbage collection we shouldn't need to do an explicit free though surely - in fact I'm not quite sure why string_destroy even exists... It's easy enough to add some frees if they are needed though. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String comparison ops
Index: string.t === RCS file: /home/perlcvs/parrot/t/op/string.t,v retrieving revision 1.8 diff -u -w -r1.8 string.t --- string.t 2001/10/05 11:46:47 1.8 +++ string.t 2001/10/10 08:42:55 @@ -1,6 +1,6 @@ #! perl -w -use Parrot::Test tests => 11; +use Parrot::Test tests => 23; output_is( <<'CODE', <
Re: String comparison ops
In message <001d01c1516a$98c07ee0$7f03ef12@MLAMBERT> Michel Lambert <[EMAIL PROTECTED]> wrote: > > You're quite right that it doesn't, but neither does anything else > > that creates temporary strings in a different encoding ;-) > > In my day-or-two-old parrot copy, the only other code that uses the > transcoding table only uses it with the second param != null (ie, save into > existing string). That's true, but if you look they've only just allocated the string on the previous line... Which is actually silly but still. Thinking about it though, that is my code as well so it doesn't really prove anything very much ;-) So the question is, are strings subject to GC or not? If they aren't then I'll knock up a patch to add the string_destroy calls. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String comparison ops
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Wed, Oct 10, 2001 at 12:49:50AM +0100, Tom Hughes wrote: > > Attached is a patch to add string comparison ops, along with the > > necessary infrastructure in the string code. > > I see no tests *or* documentation. Come on, Tom, you should know > better than that. :) I have just committed the string comparison changes, along with the related doc and test patches that I posted earlier. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] strnative.c typo
In message <[EMAIL PROTECTED]> Bryan C. Warnock <[EMAIL PROTECTED]> wrote: > Assignment, not comparison. (Plus formatted for coding standards) Committed. The tests should really have caught this, so I'm going to do some work on them to make them more comprehensive... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] strnative.c typo
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > In message <[EMAIL PROTECTED]> > Bryan C. Warnock <[EMAIL PROTECTED]> wrote: > > > Assignment, not comparison. (Plus formatted for coding standards) > > Committed. The tests should really have caught this, so I'm going to > do some work on them to make them more comprehensive... Attached is a patch to string.t to extend the testing of the comparison ops - there is now a list of pairs of strings and each of the twelve comparison ops is tried with each pair of strings from the list. I'll commit this tomorrow unless somebody spots a problem. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: string.t === RCS file: /home/perlcvs/parrot/t/op/string.t,v retrieving revision 1.9 diff -u -w -r1.9 string.t --- string.t2001/10/10 18:21:05 1.9 +++ string.t2001/10/11 23:07:03 @@ -150,320 +150,150 @@ done OUTPUT +my @strings = ( + "hello", "hello", + "hello", "world", + "world", "hello", + "hello", "hellooo", + "hellooo", "hello", + "hello", "hella", + "hella", "hello", + "hella", "hellooo", + "hellooo", "hella", + "hElLo", "HeLlO", + "hElLo", "hElLo" +); + output_is(<
Re: Simple sub support's now in!
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > I see we don't have push-with-copy ops for the various register files. I > think I'll go fix that. Bryan Warnock posted a patch to add those on Monday but it doesn't seem to have been committed... The message is <[EMAIL PROTECTED]>. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Hmmm.
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > opcheck.pl: Found 39 errors. > > Is opcheck.pl wrong, or is the optable wrong? Would like a volunteer > to fix up which it is. I''ll take a look... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Hmmm.
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > opcheck.pl: Found 39 errors. > > Is opcheck.pl wrong, or is the optable wrong? Would like a volunteer > to fix up which it is. Well as far as I can tell the rules it enforces are essentially arbitrary and not documented anywhere other than at the top of the script itself so it is hard to be sure which is wrong. That said the attached patch fixes up the opcode names to match the rules enforced by opcheck.pl, and fixes a small number of tests which were using the old names. It also fixes a 'use of undefined value' warning from opcheck.pl when no errors are found, and makes process_opfunc.pl abort with an error if opcode_table and basic_ops.ops don't match. Finally it tidies up the comments in basic_ops.ops so that they are all in the same form. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: basic_opcodes.ops === RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v retrieving revision 1.38 diff -u -w -r1.38 basic_opcodes.ops --- basic_opcodes.ops 2001/10/12 19:56:35 1.38 +++ basic_opcodes.ops 2001/10/13 10:55:28 @@ -13,7 +13,7 @@ INT_REG(P1) = INT_CONST(P2); } -/* SET Ix, Ix */ +/* SET Ix, Iy */ AUTO_OP set_i { INT_REG(P1) = INT_REG(P2); } @@ -139,7 +139,7 @@ } /* EQ Ix, CONSTANT, EQ_BRANCH */ -MANUAL_OP eq_ic_ic { +MANUAL_OP eq_i_ic_ic { if (INT_REG(P1) == INT_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -153,7 +153,7 @@ } /* NE Ix, CONSTANT, NE_BRANCH */ -MANUAL_OP ne_ic_ic { +MANUAL_OP ne_i_ic_ic { if (INT_REG(P1) != INT_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -167,7 +167,7 @@ } /* LT Ix, CONSTANT, LT_BRANCH */ -MANUAL_OP lt_ic_ic { +MANUAL_OP lt_i_ic_ic { if (INT_REG(P1) < INT_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -181,7 +181,7 @@ } /* LE Ix, CONSTANT, LE_BRANCH */ -MANUAL_OP le_ic_ic { +MANUAL_OP le_i_ic_ic { if (INT_REG(P1) <= INT_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -195,7 +195,7 @@ } /* GT Ix, CONSTANT, GT_BRANCH */ -MANUAL_OP gt_ic_ic { +MANUAL_OP gt_i_ic_ic { if (INT_REG(P1) > INT_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -209,13 +209,13 @@ } /* GE Ix, CONSTANT, GE_BRANCH */ -MANUAL_OP ge_ic_ic { +MANUAL_OP ge_i_ic_ic { if (INT_REG(P1) >= INT_CONST(P2)) { RETURN(INT_CONST(P3)); } } -/* IF IXx, TRUE_BRANCH */ +/* IF Ix, TRUE_BRANCH */ MANUAL_OP if_i_ic { if (INT_REG(P1)) { RETURN(INT_CONST(P2)); @@ -232,7 +232,7 @@ printf("%li", (long) INT_REG(P1)); } -/* PRINT ic */ +/* PRINT CONSTANT */ AUTO_OP print_ic { printf("%li", (long) INT_CONST(P1)); } @@ -253,7 +253,7 @@ INT_REG(P1)++; } -/* INC Ix, nnn */ +/* INC Ix, CONSTANT */ AUTO_OP inc_i_ic { INT_REG(P1) += INT_CONST(P2); } @@ -263,7 +263,7 @@ INT_REG(P1)--; } -/* DEC Ix, nnn */ +/* DEC Ix, CONSTANT */ AUTO_OP dec_i_ic { INT_REG(P1) -= INT_CONST(P2); } @@ -278,7 +278,7 @@ NUM_REG(P1) = NUM_CONST(P2); } -/* SET Nx, Nx */ +/* SET Nx, Ny */ AUTO_OP set_n { NUM_REG(P1) = NUM_REG(P2); } @@ -289,19 +289,19 @@ NUM_REG(P3); } -/* SUB Nx, Ny, Iz */ +/* SUB Nx, Ny, Nz */ AUTO_OP sub_n { NUM_REG(P1) = NUM_REG(P2) - NUM_REG(P3); } -/* MUL Nx, Ny, Iz */ +/* MUL Nx, Ny, Nz */ AUTO_OP mul_n { NUM_REG(P1) = NUM_REG(P2) * NUM_REG(P3); } -/* DIV Nx, Ny, Iz */ +/* DIV Nx, Ny, Nz */ AUTO_OP div_n { NUM_REG(P1) = NUM_REG(P2) / NUM_REG(P3); @@ -373,7 +373,7 @@ } /* EQ Nx, CONSTANT, EQ_BRANCH */ -MANUAL_OP eq_nc_ic { +MANUAL_OP eq_n_nc_ic { if (NUM_REG(P1) == NUM_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -387,7 +387,7 @@ } /* NE Nx, CONSTANT, NE_BRANCH */ -MANUAL_OP ne_nc_ic { +MANUAL_OP ne_n_nc_ic { if (NUM_REG(P1) != NUM_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -401,7 +401,7 @@ } /* LT Nx, CONSTANT, LT_BRANCH */ -MANUAL_OP lt_nc_ic { +MANUAL_OP lt_n_nc_ic { if (NUM_REG(P1) < NUM_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -415,7 +415,7 @@ } /* LE Nx, CONSTANT, LE_BRANCH */ -MANUAL_OP le_nc_ic { +MANUAL_OP le_n_nc_ic { if (NUM_REG(P1) <= NUM_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -429,7 +429,7 @@ } /* GT Nx, CONSTANT, GT_BRANCH */ -MANUAL_OP gt_nc_ic { +MANUAL_OP gt_n_nc_ic { if (NUM_REG(P1) > NUM_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -443,7 +443,7 @@ } /* GE Nx, CONSTANT, GE_BRANCH */ -MANUAL_OP ge_nc_ic { +MANUAL_OP ge_n_nc_ic { if (NUM_REG(P1) >= NUM_CONST(P2)) { RETURN(INT_CONST(P3)); } @@ -466,7 +466,7 @@ printf("%f", NUM_REG(P1)); } -/* PRINT nc */ +/* PRINT CONSTANT */ AUTO_OP print_nc { printf("%f", NUM_CONST(P1)); } @@ -476,7 +476,7 @@ NUM_REG(P1) += 1; } -/* INC Nx,
Re: [HELP NEEDED] moby.patch platform reports
In message <[EMAIL PROTECTED]> "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote: > It looks like moby.patch is going to go in, but I *really* need help > from people on various platforms looking at the floating point > problems. I'm hoping that someone else's compiler will complain about > whatever it is I've done that flakes it out. Barring that, I'm hoping > that among a group of folks checking it out, one of you will send me > a "what were you thinking here?" message that helps me find and fix > the problem. I think I've solved it. You're going to kick yourself... The answer is that you're not include math.h in core_ops.c which means that floor() is not prototyped which means the compiler assumes it returns an int hence the screwed up results. You must always prototype any function that returns a double. With an include math.h added floortest.pasm is now OK and trans.t almost passes - it is certainly better than before. There's still one failure in each of number.t and trans.t though although that might just be my rather mangled checkout. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Build system tweaks.
In message <[EMAIL PROTECTED]> Andy Dougherty <[EMAIL PROTECTED]> wrote: > ops2c and ops2pm need to make sure the directory for the output file > exists before trying to create any files in that directory. Well actually cvs should have created the directories when you updated, so long as you gave it the -d switch. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Build system tweaks.
In message <[EMAIL PROTECTED]> Andy Dougherty <[EMAIL PROTECTED]> wrote: > Yes, but they are empty, and there are no relevant entries in MANIFEST. > Thus, if you try to make a copy of the parrot source tree in another > directory based on the contents of the MANIFEST file, you'll get a > copy without those empty directories, and the build will fail. Actually they contain .cvsignore files, but those aren't in the manifest. Plus you probably don't want them if you're making a copy based on the manifest that therefore doesn't include the CVS control files. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] "missing" opcodes
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > Commit it for now, but I'd really, really love it if we could > automate this sort of thing. I have a plan to semi-automate it which I nearly implemted the other day but didn't get around to. Basically the idea is to extend things so an ops file can contain this: AUTO_OP add(i, i, i|ic) { $1 = $2 + $3; } and the opcode reading module would expand the i|ic to create two separate versions of the op. Obviously if two arguments had variants you would get four versions and so on. If people think that's a good solution to the problem then I'll have a go at working up a patch. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] "missing" opcodes
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 05:21 PM 10/16/2001 +0100, Tom Hughes wrote: > >I have a plan to semi-automate it which I nearly implemted the other > >day but didn't get around to. Basically the idea is to extend things > >so an ops file can contain this: > > It sounds interesting, certainly. Give it a go and we'll see how it looks. > (As long as it doesn't interfere with generating the switch statement or > function table the oploop needs...) As Gregor said, the expansion code is in the OpsFile module so anything which uses that to read the .ops file will never know anything about it. I have knocked up a first pass at a patch, which is attached for comments. I did discover one limitation of my scheme when I started using it to eliminate redundancy, namely cases like this: sub(i, i, i) sub(i, i, ic) sub(i, ic, i) If I rewrite that using my scheme as: sub(i, i|ic, i|ic) Then we wind up with a fourth variant that subtracts one constant from another. I am wondering whether I should add an extra rule that says that any expansion where there are more than two arguments and all bar the first are constants is ignored, which would allow the above and a number of other cases to be rewritten. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ ? t/test.pbc ? t/test1.c ? t/test1 Index: core.ops === RCS file: /home/perlcvs/parrot/core.ops,v retrieving revision 1.10 diff -u -w -r1.10 core.ops --- core.ops2001/10/16 18:35:04 1.10 +++ core.ops2001/10/16 23:35:32 @@ -136,30 +136,18 @@ =cut -AUTO_OP set(i, i) { +AUTO_OP set(i, i|ic) { $1 = $2; } -AUTO_OP set(i, ic) { +AUTO_OP set(n, n|nc) { $1 = $2; } -AUTO_OP set(n, nc) { - $1 = $2; -} - -AUTO_OP set(n, n) { - $1 = $2; -} - -AUTO_OP set(s, sc) { +AUTO_OP set(s, s|sc) { $1 = string_copy($2); } -AUTO_OP set(s, s) { - $1 = string_copy($2); -} - =back =cut @@ -239,38 +227,20 @@ Branch if $1 is equal to $2. =cut - -AUTO_OP eq(i, i, ic) { - if ($1 == $2) { -RETREL($3); - } -} - -AUTO_OP eq(i, ic, ic) { - if ($1 == $2) { -RETREL($3); - } -} -AUTO_OP eq(n, n, ic) { +AUTO_OP eq(i, i|ic, ic) { if ($1 == $2) { RETREL($3); } } -AUTO_OP eq(n, nc, ic) { +AUTO_OP eq(n, n|nc, ic) { if ($1 == $2) { RETREL($3); } } - -AUTO_OP eq(s, s, ic) { - if (string_compare($1, $2) == 0) { -RETREL($3); - } -} -AUTO_OP eq(s, sc, ic) { +AUTO_OP eq(s, s|sc, ic) { if (string_compare($1, $2) == 0) { RETREL($3); } @@ -295,37 +265,19 @@ =cut -AUTO_OP ne(i, i, ic) { +AUTO_OP ne(i, i|ic, ic) { if ($1 != $2) { RETREL($3); } } -AUTO_OP ne(i, ic, ic) { +AUTO_OP ne(n, n|nc, ic) { if ($1 != $2) { RETREL($3); } } -AUTO_OP ne(n, n, ic) { - if ($1 != $2) { -RETREL($3); - } -} - -AUTO_OP ne(n, nc, ic) { - if ($1 != $2) { -RETREL($3); - } -} - -AUTO_OP ne(s, s, ic) { - if (string_compare($1, $2) != 0) { -RETREL($3); - } -} - -AUTO_OP ne(s, sc, ic) { +AUTO_OP ne(s, s|sc, ic) { if (string_compare($1, $2) != 0) { RETREL($3); } @@ -349,38 +301,20 @@ Branch if $1 is less than $2. =cut - -AUTO_OP lt(i, i, ic) { - if ($1 < $2) { -RETREL($3); - } -} - -AUTO_OP lt(i, ic, ic) { - if ($1 < $2) { -RETREL($3); - } -} -AUTO_OP lt(n, n, ic) { +AUTO_OP lt(i, i|ic, ic) { if ($1 < $2) { RETREL($3); } } -AUTO_OP lt(n, nc, ic) { +AUTO_OP lt(n, n|nc, ic) { if ($1 < $2) { RETREL($3); } } - -AUTO_OP lt(s, s, ic) { - if (string_compare($1, $2) < 0) { -RETREL($3); - } -} -AUTO_OP lt(s, sc, ic) { +AUTO_OP lt(s, s|sc, ic) { if (string_compare($1, $2) < 0) { RETREL($3); } @@ -404,38 +338,20 @@ Branch if $1 is less than or equal to $2. =cut - -AUTO_OP le(i, i, ic) { - if ($1 <= $2) { -RETREL($3); - } -} - -AUTO_OP le(i, ic, ic) { - if ($1 <= $2) { -RETREL($3); - } -} -AUTO_OP le(n, n, ic) { +AUTO_OP le(i, i|ic, ic) { if ($1 <= $2) { RETREL($3); } } -AUTO_OP le(n, nc, ic) { +AUTO_OP le(n, n|nc, ic) { if ($1 <= $2) { RETREL($3); } } - -AUTO_OP le(s, s, ic) { - if (string_compare($1, $2) <= 0) { -RETREL($3); - } -} -AUTO_OP le(s, sc, ic) { +AUTO_OP le(s, s|sc, ic) { if (string_compare($1, $2) <= 0) { RETREL($3); } @@ -459,38 +375,20 @@ Branch if $1 is greater than $2. =cut - -AUTO_OP gt(i, i, ic) { - if ($1 > $2) { -RETREL($3); - } -} - -AUTO_OP gt(i, ic, ic) { - if ($1 > $2) { -RETREL($3); - } -} -AUTO_OP gt(n, n, ic) { +AUTO_OP gt(i, i|ic, ic) { if ($1 > $2) { RETREL($3); } } -AUTO_OP gt(n, nc, ic) { +AUTO_OP gt(n, n|nc, ic) { if ($1 > $2) { RETREL($3); } } - -AUTO_OP gt(s, s, ic) { - if (string_compare($1, $2) > 0) { -RETREL($3); - } -} -AUTO_OP g
Re: Missing transcoding functions?
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > I'm working on implementing the ord(i,s) and chr(s,i) opcodes I talked about > earlier, and I noticed what I consider a bug: there exist no transcode > functions to or from native. That's because we haven't worked out the necessary logistics for that yet - it requires some means for determing what the native character set is based on the locale which can then be used to load an appropriate transcoding table. > Also, the diagonals (identy transforms) don't exist. This means that you > have to explicitly check that you aren't transcoding from an encoding to the > same encoding. That is as per Dan's spec. I have thought about adding a wrapper routine to do the check you refer to. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] "missing" opcodes
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Wed, Oct 17, 2001 at 12:40:36AM +0100, Tom Hughes wrote: > > I have knocked up a first pass at a patch, which is attached for > > comments. > > I've committed this. Thanks, that should *greatly* help maintainability. I've got an extended version now that handles the other case that I was talking about. I had to change the rule a bit though so that it now ignore any expansion which has more than one expanded argument and has all the expanded arguments as constants. With that version of the patch core.ops has 25% fewer lines in it. I'll commit the updated version shortly unless somebody screams... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote: > > The string to number conversion stuff should really be done by the > > string encodings... I think this is the right way to get this > > happening, comments? > > Looks like the right way to me. Could you commit it? It's completely wrong I would have thought - the encoding layer cannot know that a given code point is a digit so it can't possibly do string to number conversion. You need to use the encoding layer to fetch each character and then the character set layer to determine what digit it represents. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > On Mon, 3 Dec 2001, Tom Hughes wrote: > > It's completely wrong I would have thought - the encoding layer > > cannot know that a given code point is a digit so it can't possibly > > do string to number conversion. > > > > You need to use the encoding layer to fetch each character and > > then the character set layer to determine what digit it represents. > Right. And then you need to apply some unified logic to get from this > vector of digits (and other such symbols) to a value. Indeed, and that logic needs to be in the string layer where it can use both the encoding routines and the character type routines. I have just rearranged things to reflect that. > I'm just having nightmares of subtily different definitions of what a > numeric constant looks like depending on the string encoding, because of > different bits o' code not being quite in sync. Code duplication bad, > code sharing good. Absolutely. That code is now in one place. > (The charset layer should still be involved somewhere, because Unicode > (for ex) has a "digit value" property. This makes, say, aribic numerials > (which don't look at all what what a normal person calls aribic numerals, > BTW) work properly. (OTOH, it might also do strange things with ex > Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also > 2, etc.)) So far I have added as is_digit() call to the character type layer to replace the existing isdigit() calls. To do things completely right we need to extend that with calls to get the digit value, check for sign characters etc, rather than assuming ASCIIish like it does now. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > Right. Unfornatly, after starting on this, I relized that that's the easy > part. Unicode has a fairly-well defined way of figuring out if a character > is a digit (see if it's category is Nd (Number/digit), and if so what it's > value is (the value of the "decimal" property.) Can it also tell you the base used for digit strings in that character set... Actually I don't know if there are any modern writing systems that don't use base ten but certainly if you were dealing with some ancient scripts that used sexagesimal numbers that might be a problem ;-) > However, there appears to be no good way of determining if somthing is a > decimal point, a sign indicator, or an E/e (exponent signifier). I suspected there wouldn't be. > The attached patch will let the chartype layer decide if a character is a > digit, and what it's value is. The patch seems to be missing though... > Note also that is_digit should now return the value of the digit if it is a > digit, or 42 if it isn't. (I had to use somthing, and ~0 sometimes wanted > to be (char)~0, and sometimes (INTVAL)~0, so I decided not to use ~0. 0, of > course, can't be used for not-a-digit, since is_digit('0')==0. I was assuming there would a separate digit_value() routine to avoid that problem. Apart from anything else there will doubtless me many other is_xxx() routines in due course which will be simple boolean tests. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> Bart Lateur <[EMAIL PROTECTED]> wrote: > On Thu, 06 Dec 2001 00:16:34 GMT, Tom Hughes wrote: > > >So far I have added as is_digit() call to the character type layer > >to replace the existing isdigit() calls. > > There seems to be an overlap with the /\d/ character class in regexes. > Can't you use the same test? Can't you use the definition of that > character class, whatever form it may be in? Well presumably the regex code should use the character type of the string it is matching against when processing \d. There isn't any regex code in yet though is there? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Bytecode portablilty
In message <20011210011601$[EMAIL PROTECTED]> "Bryan C. Warnock" <[EMAIL PROTECTED]> wrote: > - Endianness. The three major types are Big, Little, and Vaxian. > Supporting these three should handle the majority of cases. Actually VAXes have perfectly ordinary endianness - it was PDPs that had the middle endian layout. > - Floating point representations. The four major types are IEEE(ish), > Vaxian, Cray's CRI, and the IBM/370 hexadecimal format. There are some > minor variations among these, particularly with how much of the > IEEE-754 standard floating point operations adhere to. However, > adherence falls more into Portability Layer Three, and we will solely > address representation. Of course there are also about five variants of floating point format on the VAX although only two are 64 bits in size. Some of those exist (or are emulated) on Alpha as well although that also has IEEE types. > - I've code that currently converts 32, 64, 96, and 128 bit floating > point representations among all but the IBM format (for which I have > the algorithms on paper, but nowhere to test), optimized for both 32 > bit and 64 bit support. Although 96 and 128 bit handling is currently > hardcoded specifically for conversions between long doubles on x86 > machines and 64 bit processors, I've got alpha code for casting among > arbitrary types. (For casting to and from 32 bit floats on machines > that have no such type, for instance.) IEEE semantics are *not* > supported, and are still a matter for discussion. The implementation > of over- and underflow conversion to BigFloat is missing, for obvious > reasons. I'm still trying to come up with a better interface and > implementation, however. Presumably that's G_Floating that you're converting to/from for the VAX rather than D_Floating? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Bytecode portablilty
In message <20011210133529.EYKY11472.femail13.sdc1.sfba.home.com@there> Bryan C. Warnock <[EMAIL PROTECTED]> wrote: > On Monday 10 December 2001 03:06 am, Tom Hughes wrote: > > In message <20011210011601$[EMAIL PROTECTED]> > > > > Actually VAXes have perfectly ordinary endianness - it was PDPs that > > had the middle endian layout. > > Who's got the 16 bittish little endian layout ("21436587")? (Perhaps it's > wrong to categorize that as endianness.) I always believed it to be one or more of the PDP machines - most unix systems call it PDP endian in their header files. That said the jargon file lists the PDP 10 as big endian and the PDP 11 as little endian, and has this to say about the third form: middle-endian adj. Not big-endian or little-endian. Used of perverse byte orders such as 3-4-1-2 or 2-1-4-3, occasionally found in the packed-decimal formats of minicomputer manufacturers who shall remain nameless. Certainly the VAX is a perfectly ordinary little endian system. > > Presumably that's G_Floating that you're converting to/from for > > the VAX rather than D_Floating? > > Yes. Is that going to be a problem? (The sum of programs I've written on > a VAX can be represented with 1 digit. In base 2.) Well VAXC defaults to using D_Floating for doubles but can be made to use G_Floating instead with a switch to the compiler. I'm not sure whether that makes it a problem or not. > I've paper code for converting to and from D_Floating (for general data > migration), but it's range is too restrictive for my liking for floating > point constants inside of bytecode. If this is bumpkis, someone clue me > in, por favor. As you say the exponent is more restricted (it has the same size as in F_Floating which is the single precision format) but the trade off is that the mantissa is larger so you get greater precision at the expense of less range. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: JIT me some speed!
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > To run a program with the JIT, pass test_parrot the -j flag and watch it > scream. Well, scream if you're on x86 Linux or BSD (I get a speedup on > mops.pbc of 35x) but it's a darned good place to start. It does seem to be quite impressively fast. Faster even than the compiled version of mops on my machine... It looks like it is going to need some work before it can work for other instruction sets though, at least for RISC systems where the operands are typically encoded with the opcode as part of a single word and the range of immediate constants is often restricted. I'm thinking it will need some way of indicating field widths and shifts for the operands and opcode so they can be merged into an instruction word and also some way of handling a constant pool so that arbitrary addresses can be loaded using PC relative loads. I suspect it is also rather questionable to call system calls directly rather than going via their C library veneers - that is even more true when you come to things (like socket calls) which are system calls on some machines and functions on others. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: JIT me some speed!
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > On Fri, 21 Dec 2001, Tom Hughes wrote: > > > I suspect it is also rather questionable to call system calls > > directly rather than going via their C library veneers - that is > > even more true when you come to things (like socket calls) which > > are system calls on some machines and functions on others. > > We are not always calling system calls directly, we can use the C library > when ever we need it, check out the .jit syntax. I did have a brief look last night but I must have missed that. No problem that front then. Incidentally the JIT times are definitely impressive... Times for a 1.33 GHz Athlon are like this: dutton [~/src/parrot] % ./test_parrot ./examples/assembly/mops.pbc Iterations:1 Estimated ops: 2 Elapsed time: 4.806858 M op/s:41.607220 dutton [~/src/parrot] % ./test_parrot -j ./examples/assembly/mops.pbc Iterations:1 Estimated ops: 2 Elapsed time: 0.300258 M op/s:666.093736 dutton [~/src/parrot] % ./examples/assembly/mops Iterations:1 Estimated ops: 2 Elapsed time: 0.324787 M op/s:615.788117 Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [PATCH] string_transcode
In message <007f01c1930c$9d326220$[EMAIL PROTECTED]> "Peter Gibbs" <[EMAIL PROTECTED]> wrote: > Another correction to string_transcode; this function now seems to work okay > (tested using a dummy 'encode' op added to my local copy of core.ops) Applied, thanks. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODOs for STRINGs
In message <20020102054642$[EMAIL PROTECTED]> "David & Lisa Jacobs" <[EMAIL PROTECTED]> wrote: > Here is a short list of TODOs that I came up with for STRINGs. First, do > these look good to people? And second, what is the preferred method for > keeping track of these (patch to the TODO file, entries in bug tracking > system, mailing list, etc. > > * Add set ops that are encoding aware (e.g., set S0, "something", "unicode", > "utf-8")? You can already have Unicode constants by prefixing the string with a U character. I seem to recall Dan saying that he didn't want to allow constants in arbitrary encodings but instead would prefer just to have native and unicode. > * Add transcoding ops (this might be a specific case of the previous e.g., > set S0, S1, "unicode", "utf-16") I'm not sure whether this is needed. I think the idea is that in general transcoding will happen at I/O time, presumably by pushing a transcoding module on the I/O stack. > * Move like encoded string comparison into encodings (i.e., the STRING > comparison function gets the strings into the same encoding and then calls > out to the encodings comparison function - This will allow each encoding to > optimize its comparison. The problem with this is that string comparison depends on both the encoding and the character set so in general you can't do this. If the character set was the same for both strings then you could do so though. What I did think about was having a flag on each encoding that specified whether or not comparisons for that encoding could be done using memcmp() when the character sets were the same. That is true for things like the single byte encoding, but probably not for the unicode encodings due to canonicalisation issues. > * Add size of string termination to encodings (i.e., how many 0 bytes) Certainly. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Proposal: Naming conventions
In message <20020110201559$[EMAIL PROTECTED]> "Melvin Smith" <[EMAIL PROTECTED]> wrote: > > Foo foo = (Foo) malloc(sizeof(*foo)); > >? Does ANSI allow using sizeof on a variable declared on the > > same line? > > Wouldn't sizeof(Foo) be safer here? At the logical time of the > call *foo points to undefined. Technically its not a deref but > still looks scary. In C++ it might be confusing if you were to > cast it as: Well sizeof(Foo) and sizeof(*foo) are not actually the same thing at all there because Foo is presumably a typedef for a pointer type so sizeof(Foo) will be the size of a pointer and sizeof(*foo) will be the size of the thing it points to. You're quite right that it isn't technically a deref, as sizeof() is only interested in the static type of the object and is evaluated at compile time (if we ignore VLA's in C99 that is). In general it is safer to sizeof() on the variable you are working with than on it's type, as that way the sizeof() will still work if somebody changes the type of the variable. > // If it were really C++ we would probably be using new() > Foo foo = (FooBar) malloc(sizeof(*foo)); > > What type is *foo then? Should be Foo, but what if FooBar > was of different size, it might not be an obvious bug to someone > that just came along and tweaked your code. The type of *foo is whatever Foo as been typedefed as a pointer to, and FooBar is a red herring. > >If people have visceral objections to typedef'ing pointers, I'm > >fine with dropping that part of the proposal. I'd just like to see > > I've always been uncomfortable with that practice, its one part of > the whole Win32 world I hate. If you stick with the practice then > you either end up making a new typedef for every level of indirection > or you drop to using * (some typedef), etc. Now if it were C++ and we > were using a smart pointer class I don't mind the practice. I will agreee that hiding pointers inside typedefs is not a very good idea, if only because it makes it impossible to const qualify the pointer without creating a second parallel typedef. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [COMMIT] Embedding enhancements
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Sat, Feb 16, 2002 at 01:46:56AM -0800, Brent Dax wrote: > > NEW CONVENTIONS FOR DATA EXPOSED TO EMBEDDERS: > > > > -All structs should have a name of the form parrot_system_t. This name > > should never be directly used outside the subsystem in question. > > > > struct parrot_foo_t { > > ... > > }; > > Am I right in thinking that I could paraphrase that statement as > "All structs should trample in ANSI's reserved namespace"? I don't think so... As far as I can find in the standard, only certain type names ending in _t are reserved, namely: [#1] Type names beginning with int or uint and ending with _t may be added to the types defined in the header. Macro names beginning with INT or UINT and ending with _MAX or _MIN, or macro names beginning with PRI or SCN followed by any lower case letter or X may be added to the macros defined in the header. So struct x_t should be fine because that's a structure tag and not a type name. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: I submit for your aproval . . .
In message <a05101503b8da6ead2821@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 6:29 PM -0400 4/10/02, Roman Hunt wrote: > > >also I think > >encoding_lookup() should accept an argument of "native". > > Good point, they should. OTOH, that makes some of this interesting, > since which characters you use for various things depend on the > encoding and charset. We already have string_native_type which points to the CHARTYPE structure for the native character type and that structure includes default_encoding which is the name of the default encoding for the native character type. I guess string_init could also set up string_native_encoding by looking up the name of the default encoding for the native character type. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODO additions
In message <[EMAIL PROTECTED]> Steve Fink <[EMAIL PROTECTED]> wrote: > +Stability > +- > +Purify and other memory badness detectors One thing that may be useful here is valgrind, which can be found at http://developer.kde.org/~sewardj/ and does Purify types things on linux. I just hacked the parrot test suite to run parrot under valgrind and it has only come up with one problem in t/op/hacks1, the details of which are as follows: valgrind-20020329, a memory error detector for x86 GNU/Linux. Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward. For more details, rerun with: -v Syscall param open(pathname) contains uninitialised or unaddressable byte(s) at 0x403F1892: __libc_open (__libc_open:31) by 0x403829C3: _IO_fopen@@GLIBC_2.1 (iofopen.c:67) by 0x809B287: cg_core (core.ops:138) by 0x80955E0: runops_fast_core (runops_cores.c:34) Address 0x4104051D is 3201 bytes inside a block of size 32824 alloc'd at 0x4003DCC2: malloc (vg_clientmalloc.c:618) by 0x8092E11: mem_sys_allocate (memory.c:74) by 0x8098DAD: Parrot_alloc_new_block (resources.c:830) by 0x8092EC0: mem_setup_allocator (memory.c:108) ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) malloc/free: in use at exit: 249652 bytes in 54 blocks. malloc/free: 58 allocs, 4 frees, 381692 bytes allocated. For a detailed leak analysis, rerun with: --leak-check=yes For counts of detected errors, rerun with: -v I haven't attempted to look at this and see what is causing it. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODO additions
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Syscall param open(pathname) contains uninitialised or unaddressable byte(s) > at 0x403F1892: __libc_open (__libc_open:31) > by 0x403829C3: _IO_fopen@@GLIBC_2.1 (iofopen.c:67) > by 0x809B287: cg_core (core.ops:138) > by 0x80955E0: runops_fast_core (runops_cores.c:34) > Address 0x4104051D is 3201 bytes inside a block of size 32824 alloc'd > at 0x4003DCC2: malloc (vg_clientmalloc.c:618) > by 0x8092E11: mem_sys_allocate (memory.c:74) > by 0x8098DAD: Parrot_alloc_new_block (resources.c:830) > by 0x8092EC0: mem_setup_allocator (memory.c:108) > > ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) > malloc/free: in use at exit: 249652 bytes in 54 blocks. > malloc/free: 58 allocs, 4 frees, 381692 bytes allocated. > For a detailed leak analysis, rerun with: --leak-check=yes > For counts of detected errors, rerun with: -v > > I haven't attempted to look at this and see what is causing it. I've had a look at it now. The problem is that we are passing s->bufstart to fopen but there is no guarantee that there is a nul byte at the end of the buffer as parrot strings are not nul terminated. I have developed patch for this in the form of a new routine which returns a nul terminated C style string given a parrot string as argument. It does this by making sure buflen is at least one greater than bufused and then stuffing a nul in that byte. This isn't a particularly brilliant fix so I'm attaching it here for comments before I commit it. Of course we also need to think about encoding/charset issues when passing strings to system calls... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: core.ops === RCS file: /home/perlcvs/parrot/core.ops,v retrieving revision 1.119 diff -u -w -r1.119 core.ops --- core.ops3 Apr 2002 23:03:37 - 1.119 +++ core.ops13 Apr 2002 14:11:11 - @@ -135,7 +135,7 @@ =cut inline op open(out INT, in STR) { - $1 = (INTVAL)fopen(($2)->bufstart, "r+"); + $1 = (INTVAL)fopen(string_to_cstring(interpreter, ($2)), "r+"); if (!$1) { perror("Can't open"); exit(1); @@ -145,7 +145,7 @@ } inline op open(out INT, in STR, in STR) { - $1 = (INTVAL)fopen(($2)->bufstart, ($3)->bufstart); + $1 = (INTVAL)fopen(string_to_cstring(interpreter, ($2)), +string_to_cstring(interpreter, ($3))); goto NEXT(); } @@ -246,7 +246,7 @@ op print(in STR) { STRING *s = $1; if (s && string_length(s)) { -printf("%.*s", (int)string_length(s), (char *) s->bufstart); +printf("%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } @@ -255,7 +255,7 @@ PMC *p = $1; STRING *s = (p->vtable->get_string(interpreter, p)); if (s) { -printf("%.*s",(int)string_length(s),(char *) s->bufstart); +printf("%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } @@ -304,7 +304,7 @@ default: file = (FILE *)$1; } if (s && string_length(s)) { -fprintf(file, "%.*s",(int)string_length(s),(char *) s->bufstart); +fprintf(file, "%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } @@ -323,7 +323,7 @@ default: file = (FILE *)$1; } if (s) { -fprintf(file, "%.*s",(int)string_length(s),(char *) s->bufstart); +fprintf(file, "%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } Index: string.c === RCS file: /home/perlcvs/parrot/string.c,v retrieving revision 1.68 diff -u -w -r1.68 string.c --- string.c12 Apr 2002 01:40:28 - 1.68 +++ string.c13 Apr 2002 14:11:12 - @@ -802,6 +802,21 @@ NULL, 0, NULL); } +const char * +string_to_cstring(struct Parrot_Interp * interpreter, STRING * s) +{ +char *cstring; + +if (s->buflen == s->bufused) +string_grow(interpreter, s, 1); + +cstring = s->bufstart; + +cstring[s->bufused] = 0; + +return cstring; +} + /* * Local variables: Index: include/parrot/string_funcs.h === RCS file: /home/perlcvs/parrot/include/parrot/string_funcs.h,v retrieving revision 1.6 diff -u -w -r1.6 string_funcs.h --- include/parrot/string_funcs.h 22 Mar 2002 04:11:57 - 1.6 +++ include/parrot/string_funcs.h 13 Apr 2002 14:11:12 - @@ -27,6 +27,7 @@ const STRING *, STRING **); INTVAL Parrot_string_compare(Parrot, const STRING *, const STRING *); Parrot_Bool Parrot_string_bool(const STRING *); +const char *Parrot_string_cstring(const S
Re: TODO additions
In message <[EMAIL PROTECTED]> Roman Hunt <[EMAIL PROTECTED]> wrote: > why dont we default to null terminating strings of type native? > if "native" is what we get when LANG=C it only seems natural to do so. > else we are forced to use wrapper functions a that grow and manipulate > string data any time we need to pass it to standard C functions that > wont accept a string_length parameter, this list unfortunately contains > several syscalls. Well that is what perl 5 does certainly. I thought it had been decided not to do that in perl 6 though due to issues about what it meant to nul terminate in various different character sets. We can't assume that US-ASCII will be native everywhere though as some platforms may use some form of unicode as the native character set (and accept unicode arguments to systems calls). It does need some thought though, to determine how best to handle this issue. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODO additions
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > I have developed patch for this in the form of a new routine > which returns a nul terminated C style string given a parrot > string as argument. It does this by making sure buflen is at > least one greater than bufused and then stuffing a nul in that > byte. > > This isn't a particularly brilliant fix so I'm attaching it here > for comments before I commit it. I haven't seen any major objections to this so I have committed it. It will at least ensure that file opening is stable for the upcoming release. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: transcode addition
In message <[EMAIL PROTECTED]> Roman Hunt <[EMAIL PROTECTED]> wrote: > I'm not too sure if this is necessary but it seems logical to get things > into charsets our compilers can handle. Hopefully this is the correct > approach . . . . also this should NULL terminate in the event that the > entire buffer had not yet been filled. This is wrong - you need to worry about the character set as well as the encoding, and at the very least you should compare the encoding to the default encoding for the native charset and not assume that it will always be singlebyte. You buffer termination code is also wrong - bufused is the end of the string. You are null terminating the buffer not the string, and the buffer may have extra space. Plus you have created a buffer overrun. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: x86 linux memory leak checker (and JIT ideas)
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > Jarkko mailed this URL to p5p: > > http://developer.kde.org/~sewardj/ > > It describes a free (GPL) memory leak checker for x86 Linux > > 1: This may be of use for parrot hackers Which is why I mentioned it a week or two ago ;-) I also ran it over the test suite and fixed the only bug that it found at that time... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Dynaloading
In message <a05111b2fb92c9ba1ac83@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > The exported name should be the MD5 checksum of a string that > represents the actual routine name we're looking for. This, I think, > should be specified somewhere external to the library, in some sort > of metadata file, I think. (Not sure, I'm waffling here. But we need > this to be unique) Why does it need to be unique if it's not going to be linked against anything? If you're just finding the name with dlsym() or equivalent then you can just use the same name in all the libraries and it won't clash. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Stack performance issue
There is a performance issue in the stack code, which the attached patch attempts to address. The problem revolves around what happens when you are close to the boundary between two chunks. When this happens you can find that you are in a loop where something is pushed on the stack, causing a new chunk to be allocated. That item is then popped causing the new chunk to be discarded only for it to have to be allocated again on the next iteration of the loop. This is a well known problem with chunked stacks - it is certainly a known issue on ARM based machines which use the chunked stack variant of the ARM procedure call standard. The solution there is to always keep one chunk in reserve - when you move back out of a chunk you don't free it. Instead you wait until you move back another chunk and then free the chunk after the one that has just emptied. Even this can go wrong if your loop pushes more that one chunks worth of data on the stack and then pops it again, but that is far rarer than the general case of pushing one or two items which happens to take it over a chunk boundary. The attached patch implements this one behind logic, both for the generic stack and the integer stack. If nobody has any objections then I'll commit it tomorrow sometime. Some figures from my test programs, running on a K6-200 linux box. The test programs push and pop 65536 times with the first column being when that loop doesn't cross a chunk boundary and the second being when it does cross a chunk boundary: No overflow Overflow Integer stack, before patch 0.065505s 16.589480s Integer stack, after patch 0.062732s 0.068460s Generic stack, before patch 0.161202s 5.475367s Generic stack, after patch 0.166938s 0.168390s Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: rxstacks.c === RCS file: /cvs/public/parrot/rxstacks.c,v retrieving revision 1.5 diff -u -r1.5 rxstacks.c --- rxstacks.c 17 May 2002 21:38:20 - 1.5 +++ rxstacks.c 30 Jun 2002 17:42:02 - @@ -46,13 +46,20 @@ /* Register the new entry */ if (++chunk->used == STACK_CHUNK_DEPTH) { -/* Need to add a new chunk */ -IntStack_Chunk new_chunk = mem_allocate_aligned(sizeof(*new_chunk)); -new_chunk->used = 0; -new_chunk->next = stack; -new_chunk->prev = chunk; -chunk->next = new_chunk; -stack->prev = new_chunk; +if (chunk->next == stack) { +/* Need to add a new chunk */ +IntStack_Chunk new_chunk = mem_allocate_aligned(sizeof(*new_chunk)); +new_chunk->used = 0; +new_chunk->next = stack; +new_chunk->prev = chunk; +chunk->next = new_chunk; +stack->prev = new_chunk; +} +else { +/* Reuse the spare chunk we kept */ +chunk = chunk->next; +stack->prev = chunk; +} } } @@ -67,11 +74,17 @@ /* That chunk != stack check is just to allow the empty stack case * to fall through to the following exception throwing code. */ -/* Need to pop off the last entry */ -stack->prev = chunk->prev; -stack->prev->next = stack; -/* Relying on GC feels dirty... */ -chunk = stack->prev; +/* If the chunk that has just become empty is not the last chunk + * on the stack then we make it the last chunk - the GC will clean + * up any chunks that are discarded by this operation. */ +if (chunk->next != stack) { +chunk->next = stack; +} + +/* Now back to the previous chunk - we'll keep the one we have + * just emptied around for now in case we need it again. */ +chunk = chunk->prev; +stack->prev = chunk; } /* Quick sanity check */ Index: stacks.c === RCS file: /cvs/public/parrot/stacks.c,v retrieving revision 1.34 diff -u -r1.34 stacks.c --- stacks.c25 Jun 2002 23:50:51 - 1.34 +++ stacks.c30 Jun 2002 17:42:02 - @@ -208,22 +208,29 @@ /* Do we need a new chunk? */ if (chunk->used == STACK_CHUNK_DEPTH) { -/* Need to add a new chunk */ -Stack_Chunk_t *new_chunk = mem_allocate_aligned(sizeof(Stack_Chunk_t)); - -new_chunk->used = 0; -new_chunk->next = stack_base; -new_chunk->prev = chunk; -chunk->next = new_chunk; -stack_base->prev = new_chunk; -chunk = new_chunk; - -/* Need to initialize this pointer before the collector sees it */ -chunk->buffer = NULL; -chunk->buffer = new_buffer_header(interpreter); - -Parrot_allocate(interpreter, chunk->
Re: Stack performance issue
In message <[EMAIL PROTECTED]> Melvin Smith <[EMAIL PROTECTED]> wrote: > You might want to modify register stacks too. I currently have a > band-aid on it that just doesn't free stack chunks which works in > all but the weirdest cases. I've done that now. I also just realised that the stacks are allocating their chunks directly from the system, which presumably means the GC won't pick them up so they need to be freed directly. I've done that for the register stacks, and I'll do the same for the other stacks unless somebody spots a flaw in my logic and points out that the GC will catch it... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Adding the system stack to the root set
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Wed, Jul 10, 2002 at 06:49:06PM -0400, Dan Sugalski wrote: > > Yes, this is an issue for systems with a chunked stack. As far as I > > know that only applies to the various ARM OSes, and for those we'll > > have to have some different system specific code to deal with the > > stack. (Which is fine) > > Sorry, I wasn't clear in my previous reply to your private message. > ARM Linux doesn't use a chunked stack. It's contiguous, and (for example) > the Bohem garbage collector does work on it. I would expect NetBSD ARM > doesn't either. (There is a FreeBSD port to StrongARM, but its mailing > list is very very quiet). So I don't think those two will pose undue > problems. As far as I know all the ARM unixes use a contiguous stack - it's just RISC OS that uses the chunked stack I believe. I believe you can always tell by looking at where sl points and seeing if there is a valid chunk descriptor there and then following it's prev pointer to get the previous chunk if there is one. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #789] [PATCH] Squish some warnings
In message <20020712010920$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #789] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=789 > > > > > stack_chunk is now Stack_Chunk... Applied. Somebody update the ticket please... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #790] [PATCH] MANIFEST update
In message <20020712005836$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #790] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=790 > > > > > Self-explanatory. Applied. Somebody please update the ticket... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #788] [PATCH] Array fixes (and tests)
In message <20020711221132$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #788] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=788 > > > > > This patch fixes a number of off-by-one errors in array.pmc, and adds a > few more tests. Applied. Somebody please update the ticket... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
In message <20020703012231$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > This code: > > A:# prints "a" > print "a" > end > > doesn't assemble; the assembler dies with the error message: > > Use of uninitialized value in hash element at assemble.pl line 844. > Couldn't find operator '' on line 1. > > If you remove the ""s from the comment, it works fine. Likewise, if > you put the label, op and comment on the same line, ie: > >A: print "a" # prints "a" > end > > then it assembles and runs OK. Here's a patch that will fix this. I havn't committed it because I'm not sure why the assember wasn't dropping comments that included quotes so I'm giving people who know more about the assembler than me a chance to comment first... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: assemble.pl === RCS file: /cvs/public/parrot/assemble.pl,v retrieving revision 1.77 diff -u -r1.77 assemble.pl --- assemble.pl 4 Jul 2002 18:36:17 - 1.77 +++ assemble.pl 13 Jul 2002 17:30:48 - @@ -433,7 +433,7 @@ $self->{pc}++; return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank lines - $line=~s/#[^'"]+$//; # Remove trailing comments + $line=~s/#.*$//; # Remove trailing comments $line=~s/(^\s+|\s+$)//g; # Remove leading and trailing whitespace # # Accumulate lines that only have labels until an instruction is found..
Re: [netlabs #758] [PATCH] Fixes for example programs
In message <20020703015823$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #758] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=758 > > > > > Fixes to various of the PASM examples in light of recent changes in the > assembler. Applied. Somebody please update the ticket... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
In message <20020713174114$[EMAIL PROTECTED]> brian wheeler <[EMAIL PROTECTED]> wrote: > On Sat, 2002-07-13 at 12:32, Tom Hughes wrote: > > In message <20020703012231$[EMAIL PROTECTED]> > > Here's a patch that will fix this. I havn't committed it because I'm > > not sure why the assember wasn't dropping comments that included quotes > > so I'm giving people who know more about the assembler than me a chance > > to comment first... > > I believe it wasn't dropping the comments with quotes as a side effect > of not wanting to break things like: > print "#" > > which breaks with the included patch. I basically had the same patch > you do, but wasn't able to figure out how to handle the above case *and* > do the right thing with # prints "a" Of course... The attached patch should handle that I think... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: assemble.pl === RCS file: /cvs/public/parrot/assemble.pl,v retrieving revision 1.77 diff -u -r1.77 assemble.pl --- assemble.pl 4 Jul 2002 18:36:17 - 1.77 +++ assemble.pl 13 Jul 2002 17:49:58 - @@ -430,10 +430,13 @@ sub _annotate_contents { my ($self,$line) = @_; + my $str_re = qr(\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\" | + \'(?:[^\\\']*(?:\\.[^\\\']*)*)\' + )x; $self->{pc}++; return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank lines - $line=~s/#[^'"]+$//; # Remove trailing comments + $line=~s/^((?:[^'"]+|$str_re)*)#.*$/$1/; # Remove trailing comments $line=~s/(^\s+|\s+$)//g; # Remove leading and trailing whitespace # # Accumulate lines that only have labels until an instruction is found..
Re: Parrot_open_i_sc_sc
In message <[EMAIL PROTECTED]> Bryan Logan <[EMAIL PROTECTED]> wrote: > Here's the code I have: > > open I0, "test.txt", "<" > open I1, "testdtxt", "<" > end > > I assemble and load it into pdb and get this: > > Parrot Debugger 0.0.1 > > (pdb) list > 1 open_i_sc_sc I0,"test.txt<","<" > 2 open_i_sc_sc I1,"testdtxt","<" > 3 end This is a bug in the debugger (and also in the opcode tracing) where it is assuming that constant strings in the byte code are zero terminated when they aren't, and it is therefore overrunning and printing bits of the next string or whatever. I have just committed a fix. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
PARROT QUESTIONS: Keyed access
I've been trying to make sense of the current status of keyed access at all levels, from the assembler through the ops to the vtables and it has to be said that the harder I look the more confused I seem to become... It all seems to be a bit of a mess at the moment, and I'd like to have a go at cleaning it up but first of all I need to work out how it is all supposed to work. It is clear that the encoding currently used by the assembler does not match that specified by PDD 8 as the following examples show: Instruction PDD 8 Encoding Actual Current Encoding set P1["hi"], 1234 set_p_kc_ic set_keyed_p_sc_ic set P1[S1], 1234set_p_r_ic set_keyed_p_s_ic set P1[1], 1234 set_p_kc_ic set_keyed_integer_p_ic_ic set P1[I1], 1234set_p_r_ic set_keyed_integer_p_k_ic set P1[S1], P2[S2] set_p_r_p_r set_keyed_p_s_p_s set P1[I1], P2[S2] set_p_kc_p_rset_keyed_keyed_integer_p_i_p_s Obviously this is a complete nonsense. To be honest I suspect that both encodings have problems, The PDD 8 encoding uses kc and r (why not kc and k?) to encode the keys regardless of their type so the op has no way of knowing what sort of argument it is dealing with. The currently implemented system distinguishes the operand types OK but trys to differentiate between ops with an integer key and those with other types of keys which all falls apart when you have a combination of integer and non-integer keys in the same instruction. Once we get to multi-component keys things just get even worse. If we believe PDD 8 then the syntax should be: set P1[I1;I2], I3 But what is currently implemented is this: set P1[k;I1;I2], I3 In addition it appears that the current implementation would turn that instrucion into this encoding: set_keyed_integer_p_k_k_i Where each component of the key becomes a separate argument, thereby requiring an infinite number of ops to cope with an infinite number of possible key components. There is a suggestion in PDD 8 that this should be encoded as this: set_p_kc_i With the key constant actually referring to an entry in the constant table that encodes the key. Moving on the from the assembler I'm not sure how the recent addition of the _keyed_int vtable methods interacts with all this - they appear to be at odds with PDD 8 anyway which appears to want to avoid the kind of vtable explosion that they promote. Anyhow, that's probably enough for now... If anybody can elighten me about how all this is supposed to work then I'll try and knock it all into shape, starting with making sure that PDD 8 is accurate. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: PARROT QUESTIONS: Keyed access
In message <[EMAIL PROTECTED]> Melvin Smith <[EMAIL PROTECTED]> wrote: > At 03:54 PM 7/14/2002 +0100, Tom Hughes wrote: > >I've been trying to make sense of the current status of keyed access > >at all levels, from the assembler through the ops to the vtables and > >it has to be said that the harder I look the more confused I seem to > >become... > > FWIW, I have a large patch from Sean O'Rourke in response to my > request for someone to cleanup the set/set_keyed stuff. I'll commit > it later today, it does clean it up a bit, and removes some of the > older versions of set (3 arg). It at least reduces the noise. I was going to some work on that request, but I reached the point where I decided there was no point trying to do anything until it was clear what the target was that I was trying to reach... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: [PATCH] MANIFEST update
In message <[EMAIL PROTECTED]> Andy Dougherty <[EMAIL PROTECTED]> wrote: > On Wed, 17 Jul 2002, Brent Dax wrote: > > > There should be no Makefile.in's left in the source--they've been tossed > > in favor of config/gen/makefiles. > > Fair enough. I just took what cvs handed me. It was a fresh checkout as > of yesterday, updated this morning. Whoever removes those files from the > repository ought to adjust MANIFEST accordingly. I have removed the files and updated the MANIFEST to reflect that. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
In message <[EMAIL PROTECTED]> "David M. Lloyd" <[EMAIL PROTECTED]> wrote: > On Sat, 13 Jul 2002, Tom Hughes wrote: > > > Of course... The attached patch should handle that I think... > > This patch is breaking several Solaris 32-bit tests. The following > assembly (from t/pmc/perlarray1.pbc): I've just tried that test on a Solaris 7 machine and it ran fine and produced the correct bytecode. I can't honestly see how that patch could cause it to generate completely the wrong op... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/