from:"Tom Hughes"

Patch to link with the maths library

2001-09-13 Thread Tom Hughes


Now that parrot has the advanced math ops in it needs to link with
the maths library or you get lots of missing symbols. Patch as follows:

Index: Makefile
===
RCS file: /home/perlcvs/parrot/Makefile,v
retrieving revision 1.9
diff -c -r1.9 Makefile
*** Makefile2001/09/13 07:22:36 1.9
--- Makefile2001/09/13 08:20:54
***
*** 12,18 
  all : $(O_FILES) test_prog
  
  test_prog: test_main$(O) $(O_FILES)
!   gcc -o test_prog $(O_FILES) test_main$(O)
  
  test_main$(O): $(H_FILES)
  
--- 12,18 
  all : $(O_FILES) test_prog
  
  test_prog: test_main$(O) $(O_FILES)
!   gcc -o test_prog $(O_FILES) test_main$(O) -lm
  
  test_main$(O): $(H_FILES)
  
Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Patch to fix += on rvalue

2001-09-13 Thread Tom Hughes


The inc_n_nc op does this:

  (NV)NUM_REG(P1) += P2

Unfortunately the (NV) cast means that the LHS is not an lvalue and
cannot therefore be assigned to in ANSI C. It seems that gcc allows
you to get away with this, but other compiler don't.

The cast is also unnecessary as NUM_REG() gives an NV anyway, so this
patch removes the cast:

Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.11
diff -u -r1.11 basic_opcodes.ops
--- basic_opcodes.ops   2001/09/13 07:27:46 1.11
+++ basic_opcodes.ops   2001/09/13 08:27:40
@@ -219,7 +219,7 @@
 
 // INC Nx, nnn
 AUTO_OP inc_n_nc {
-  (NV)NUM_REG(P1) += P2;
+  NUM_REG(P1) += P2;
 }
 
 // DEC Nx

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Patch to fix C++ style comments

2001-09-13 Thread Tom Hughes


The parrot code is currently full of C++ style comments which cause
many C compilers to barf. The attached patch changes these to C style
comments to fix this problem.

BTW I have had to resend this because my first attempt was bounced
apparently for having the patch as a text/plain attachment rather than
inline. Isn't that a bit OTT though? I can understand blocking HTML
messages and attachments but I prefer to send patches as attachments
as it ensures that trailing blank lines and such like are properly
preserved and basically that the patch arrives completely intact.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

diff -u parrot/basic_opcodes.ops parrot.fixed/basic_opcodes.ops
--- parrot/basic_opcodes.opsThu Sep 13 08:27:46 2001
+++ parrot.fixed/basic_opcodes.ops  Thu Sep 13 09:23:13 2001
@@ -7,47 +7,47 @@
 #include "parrot.h"
 #include "math.h"
 
-// SET Ix, CONSTANT
+/* SET Ix, CONSTANT */
 AUTO_OP set_i_ic {
   INT_REG(P1) = P2;
 }
   
-// SET Ix, Ix
+/* SET Ix, Ix */
 AUTO_OP set_i {
   INT_REG(P1) = INT_REG(P2);
 }
   
-// ADD Ix, Iy, Iz  
+/* ADD Ix, Iy, Iz   */
 AUTO_OP add_i {
   INT_REG(P1) = INT_REG(P2) +
INT_REG(P3);
 }
 
-// SUB Ix, Iy, Iz  
+/* SUB Ix, Iy, Iz   */
 AUTO_OP sub_i {
   INT_REG(P1) = INT_REG(P2) -
INT_REG(P3);
 }
 
-// MUL Ix, Iy, Iz  
+/* MUL Ix, Iy, Iz   */
 AUTO_OP mul_i {
   INT_REG(P1) = INT_REG(P2) *
INT_REG(P3);
 }
 
-// DIV Ix, Iy, Iz  
+/* DIV Ix, Iy, Iz   */
 AUTO_OP div_i {
   INT_REG(P1) = INT_REG(P2) /
INT_REG(P3);
 }
 
-// MOD Ix, Iy, Iz  
+/* MOD Ix, Iy, Iz   */
 AUTO_OP mod_i {
   INT_REG(P1) = INT_REG(P2) %
INT_REG(P3);
 }
 
-// EQ Ix, Iy, EQ_BRANCH, NE_BRANCH
+/* EQ Ix, Iy, EQ_BRANCH, NE_BRANCH */
 MANUAL_OP eq_i_ic {
   if (INT_REG(P1) == INT_REG(P2)) {
 RETURN(P3);
@@ -56,7 +56,7 @@
   }
 }
 
-// NE Ix, Iy, NE_BRANCH, EQ_BRANCH
+/* NE Ix, Iy, NE_BRANCH, EQ_BRANCH */
 MANUAL_OP ne_i_ic {
   if (INT_REG(P1) != INT_REG(P2)) {
 RETURN(P3);
@@ -65,7 +65,7 @@
   }
 }
 
-// LT Ix, Iy, LT_BRANCH, GE_BRANCH
+/* LT Ix, Iy, LT_BRANCH, GE_BRANCH */
 MANUAL_OP lt_i_ic {
   if (INT_REG(P1) < INT_REG(P2)) {
 RETURN(P3);
@@ -74,7 +74,7 @@
   }
 }
 
-// LE Ix, Iy, LE_BRANCH, GT_BRANCH
+/* LE Ix, Iy, LE_BRANCH, GT_BRANCH */
 MANUAL_OP le_i_ic {
   if (INT_REG(P1) <= INT_REG(P2)) {
 RETURN(P3);
@@ -83,7 +83,7 @@
   }
 }
 
-// GT Ix, Iy, GT_BRANCH, LE_BRANCH
+/* GT Ix, Iy, GT_BRANCH, LE_BRANCH */
 MANUAL_OP gt_i_ic {
   if (INT_REG(P1) > INT_REG(P2)) {
 RETURN(P3);
@@ -92,7 +92,7 @@
   }
 }
 
-// GE Ix, Iy, GE_BRANCH, LT_BRANCH
+/* GE Ix, Iy, GE_BRANCH, LT_BRANCH */
 MANUAL_OP ge_i_ic {
   if (INT_REG(P1) >= INT_REG(P2)) {
 RETURN(P3);
@@ -101,7 +101,7 @@
   }
 }
 
-// IF IXx, TRUE_BRANCH, FALSE_BRANCH
+/* IF IXx, TRUE_BRANCH, FALSE_BRANCH */
 MANUAL_OP if_i_ic {
   if (INT_REG(P1)) {
 RETURN(P2);
@@ -110,81 +110,81 @@
   }
 }
 
-// TIME Ix
+/* TIME Ix */
 AUTO_OP time_i {
   INT_REG(P1) = time(NULL);
 }
 
-// PRINT Ix
+/* PRINT Ix */
 AUTO_OP print_i {
   printf("I reg %li is %li\n", P1, INT_REG(P1));
 }
  
-// BRANCH CONSTANT
+/* BRANCH CONSTANT */
 MANUAL_OP branch_ic {
   RETURN(P1);
 }
 
-// END
+/* END */
 MANUAL_OP end {
RETURN(0);
 }
 
-// INC Ix
+/* INC Ix */
 AUTO_OP inc_i {
   INT_REG(P1)++;
 }
 
-// INC Ix, nnn
+/* INC Ix, nnn */
 AUTO_OP inc_i_ic {
   INT_REG(P1) += P2;
 }
 
-// DEC Ix
+/* DEC Ix */
 AUTO_OP dec_i {
   INT_REG(P1)--;
 }
 
-// DEC Ix, nnn
+/* DEC Ix, nnn */
 AUTO_OP dec_i_ic {
   INT_REG(P1) -= P2;
 }
 
-// JUMP Ix
+/* JUMP Ix */
 MANUAL_OP jump_i {
   RETURN(INT_REG(P1));
 }
 
-// SET Nx, CONSTANT
+/* SET Nx, CONSTANT */
 AUTO_OP set_n_nc {
   NUM_REG(P1) = P2;
 }
   
-// ADD Nx, Ny, Nz  
+/* ADD Nx, Ny, Nz   */
 AUTO_OP add_n {
   NUM_REG(P1) = NUM_REG(P2) +
NUM_REG(P3);
 }
 
-// SUB Nx, Ny, Iz  
+/* SUB Nx, Ny, Iz   */
 AUTO_OP sub_n {
   NUM_REG(P1) = NUM_REG(P2) -
NUM_REG(P3);
 }
 
-// MUL Nx, Ny, Iz  
+/* MUL Nx, Ny, Iz   */
 AUTO_OP mul_n {
   NUM_REG(P1) = NUM_REG(P2) *
NUM_REG(P3);
 }
 
-// DIV Nx, Ny, Iz  
+/* DIV Nx, Ny, Iz   */
 AUTO_OP div_n {
   NUM_REG(P1) = NUM_REG(P2) /
NUM_REG(P3);
 }
 
-// EQ Nx, Ny, EQ_BRANCH, NE_BRANCH
+/* EQ Nx, Ny, EQ_BRANCH, NE_BRANCH */
 MANUAL_OP eq_n_ic {
   if (NUM_REG(P1) == NUM_REG(P2)) {
 RETURN(P3);
@@ -193,7 +193,7 @@
   }
 }
 
-// IF Nx, TRUE_BRANCH, FALSE_BRANCH
+/* IF Nx, TRUE_BRANCH, FALSE_BRANCH */
 MANUAL_OP if_n_ic {
   if (NUM_REG(P1)) {
 RETURN(P2);
@@ -202,369 +202,369 @@
   }
 }
 
-// TIME Nx
+/* TIME Nx */
 AUTO_OP time_n {
   NUM_REG(P1) = time(NULL);
 }
 
-// PRINT Nx
+/* PRINT Nx */
 AUTO_OP print_n {
   printf("N reg %li is %f\n", P1, NUM_REG(P1));
 }
  
-// INC Nx
+/* INC Nx */
 AUTO_OP inc_n {
   NUM_REG(P

Patch to remove use of structure constant/cast

2001-09-13 Thread Tom Hughes


Setting up the strnative vtable is being done by casting a {}
delimited list of values to a structure type but this is a gcc
extension not ANSI C, so the following patch reworks this to
be ANSI compliant:

Index: strnative.c
===
RCS file: /home/perlcvs/parrot/strnative.c,v
retrieving revision 1.4
diff -u -r1.4 strnative.c
--- strnative.c 2001/09/13 07:14:24 1.4
+++ strnative.c 2001/09/13 08:36:34
@@ -55,7 +55,7 @@
 
 STRING_VTABLE 
 string_native_vtable (void) {
-return (STRING_VTABLE) {
+STRING_VTABLE sv = {
enc_native,
string_native_compute_strlen,
string_native_max_bytes,
@@ -63,4 +63,5 @@
string_native_chopn,
string_native_substr,
};
+return sv;
 }

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: Patch to fix C++ style comments

2001-09-13 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
Simon Cozens <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 13, 2001 at 09:35:33AM +0100, Tom Hughes wrote:
> > BTW I have had to resend this because my first attempt was bounced
> > apparently for having the patch as a text/plain attachment rather than
> > inline. Isn't that a bit OTT though?
> 
> Hrm, I think other people have managed...

Wierd. Must be something to do with the MIME that Gnus created then.

> Both this, and the other patch, (struct in strnative.c) applied.

I just realised I missed one:

===
RCS file: /home/perlcvs/parrot/config.h.in,v
retrieving revision 1.1
diff -u -r1.1 config.h.in
--- config.h.in 2001/09/11 09:44:00 1.1
+++ config.h.in 2001/09/13 08:52:26
@@ -13,7 +13,7 @@
 typedef void DPOINTER;
 typedef void SYNC;
 
-//typedef IV *(*opcode_funcs)(void *, void *) OPFUNC;
+/*typedef IV *(*opcode_funcs)(void *, void *) OPFUNC; */
 
 #define FRAMES_PER_CHUNK 16
 

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Patch to fix arithmetic on void * pointers

2001-09-13 Thread Tom Hughes


This patch fixes a couple of cases where arithmetic on void * pointers
is being done, which isn't valid although gcc seems to allow it.

Of course the memory.c code is broken anyway because it assumes a 
pointer will fit in an IV and I'm not sure that will always be true
will it? Anyway with this patch and the others it now builds on
a Unixware box with the system compiler:

Index: memory.c
===
RCS file: /home/perlcvs/parrot/memory.c,v
retrieving revision 1.3
diff -u -r1.3 memory.c
--- memory.c2001/09/12 17:58:55 1.3
+++ memory.c2001/09/13 09:00:34
@@ -26,7 +26,7 @@
 
   mem = malloc(max_to_alloc);
   if (((IV)mem & mask) < (IV)mem) {
-mem = (void *)((IV)mem & mask) + ~mask + 1;
+mem = (void *)(((IV)mem & mask) + ~mask + 1);
   } 
   return mem;
 }
Index: strnative.c
===
RCS file: /home/perlcvs/parrot/strnative.c,v
retrieving revision 1.5
diff -u -r1.5 strnative.c
--- strnative.c 2001/09/13 08:44:08 1.5
+++ strnative.c 2001/09/13 09:00:34
@@ -26,7 +26,7 @@
 
 /* b is now in native format */
 string_grow(a, a->strlen + b->strlen);
-Sys_Memcopy(a->bufstart + a->strlen, b->bufstart, b->strlen);
+Sys_Memcopy((char *)a->bufstart + a->strlen, b->bufstart, b->strlen);
 a->strlen = a->bufused = a->strlen + b->strlen;
 return a;
 }
@@ -47,7 +47,7 @@
 
 /* Offset and length have already been "normalized" */
 string_grow(dest, src->strlen - length);
-Sys_Memcopy(dest->bufstart, src->bufstart + offset, length);
+Sys_Memcopy(dest->bufstart, (char *)src->bufstart + offset, length);
 dest->strlen = dest->bufused = length;
 
 return dest;

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: String API

2001-09-13 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Benjamin Stuhl <[EMAIL PROTECTED]> wrote:

> Thus wrote the illustrious Simon Cozens:
> [severely trimmed]
> > STRING* string_make(void *buffer, IV buflen, IV
> > encoding, IV flags, IV type)
> > STRING* string_copy(STRING* s)
> > void string_destroy(STRING *s)
>
> *cough* Namespace pollution *cough*
>
> These should proably all be prefixed...

Especially since all function names starting with str are strictly
speaking reserved to ANSI/ISO for future expansion of the string.h
facilities ;-)

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Patch to fix not op

2001-09-16 Thread Tom Hughes


The not op seems to be doing a logical not rather than a bitwise
not. Patch to fix it is as follows:

Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.17
diff -u -r1.17 basic_opcodes.ops
--- basic_opcodes.ops   2001/09/16 15:49:22 1.17
+++ basic_opcodes.ops   2001/09/16 16:27:30
@@ -564,7 +564,7 @@

 /* NOT_i */
 AUTO_OP not_i {
-  INT_REG(P1) = ! INT_REG(P2);
+  INT_REG(P1) = ~ INT_REG(P2);
 }

 /* OR_i */

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Patch to add string_nprintf

2001-09-17 Thread Tom Hughes


The attached patch adds string_nprintf, the last unimplemented
function listed in strings.pod as far as I can see.

It should cope with both the differences in return values for
vsnprintf between different versions of glibc but there are still
a few platforms which may have problems as they have a vsnprintf
which exhibits a third form of behaviour in the return value, namely
that they return the amount they did manage to produce on overflow.

I'm not sure there is a clean way to cope with that interface without
a configure test to detect it. Equally older systems may not have a
vsnprintf at all which leaves with a problem on those systems.

On a vaguely related note string_substr takes a STRING** for the
destination which seems redundant given than it returns the dest
string, and doesn't even fill in the argument if it does do the
allocation itself. I would suggest making it a STRING* which would
then be consistent with the nprintf interface.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu



? xxx
Index: parrot.h
===
RCS file: /home/perlcvs/parrot/parrot.h,v
retrieving revision 1.8
diff -u -r1.8 parrot.h
--- parrot.h2001/09/16 22:05:21 1.8
+++ parrot.h2001/09/17 08:20:49
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define NUM_REGISTERS 32
 #define PARROT_MAGIC 0x13155a1
Index: string.c
===
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.7
diff -u -r1.7 string.c
--- string.c2001/09/16 01:45:51 1.7
+++ string.c2001/09/17 08:20:49
@@ -139,6 +139,21 @@
 return (ENC_VTABLE(s)->chopn)(s, n);
 }
 
+/*=for api string string_nprintf
+ * format output into a string.
+ */
+STRING*
+string_nprintf(STRING* dest, IV len, char* format, ...) {
+va_list ap;
+if (!dest) {
+dest = string_make(NULL, 0, enc_native, 0, 0);
+}
+va_start(ap, format);
+dest = (ENC_VTABLE(dest)->nprintf)(dest, len, format, ap);
+va_end(ap);
+return dest;
+}
+
 /*
  * Local variables:
  * c-indentation-style: bsd
Index: string.h
===
RCS file: /home/perlcvs/parrot/string.h,v
retrieving revision 1.6
diff -u -r1.6 string.h
--- string.h2001/09/16 01:45:51 1.6
+++ string.h2001/09/17 08:20:49
@@ -32,6 +32,7 @@
 typedef STRING* (*string_iv_to_string_t)(STRING *, IV);
 typedef STRING* (*two_strings_iv_to_string_t)(STRING *, STRING *, IV);
 typedef STRING* (*substr_t)(STRING*, IV, IV, STRING*);
+typedef STRING* (*nprintf_t)(STRING*, IV, char*, va_list);
 typedef IV (*iv_to_iv_t)(IV);
 
 struct string_vtable {
@@ -41,6 +42,7 @@
 two_strings_iv_to_string_t concat;  /* Append string b to the end of string a */
 string_iv_to_string_t chopn;/* Remove n characters from the end of a 
string */
 substr_t substr;/* Substring operation */
+nprintf_t nprintf;  /* Formatted output operation */
 };
 
 struct parrot_string {
@@ -67,6 +69,8 @@
 string_chopn(STRING*, IV);
 STRING*
 string_substr(STRING*, IV, IV, STRING**);
+STRING*
+string_nprintf(STRING*, IV, char*, ...);
 
 /* Declarations of other functions */
 IV
Index: strnative.c
===
RCS file: /home/perlcvs/parrot/strnative.c,v
retrieving revision 1.10
diff -u -r1.10 strnative.c
--- strnative.c 2001/09/16 01:45:51 1.10
+++ strnative.c 2001/09/17 08:20:49
@@ -80,6 +80,36 @@
 return dest;
 }
 
+/*=for api string_native string_native_nprintf
+   format output into a string.
+*/
+static STRING*
+string_native_nprintf(STRING* dest, IV len, char* format, va_list ap) {
+if (len > 0) {
+string_grow(dest, len);
+len = vsnprintf(dest->bufstart, len, format, ap);
+if (len > dest->buflen) {
+len = dest->buflen;
+}
+}
+else {
+while (len == 0 || len > dest->buflen)
+{
+if (len < 0) {
+string_grow(dest, dest->buflen * 2);
+}
+else if (len > dest->buflen) {
+string_grow(dest, len);
+}
+len = vsnprintf(dest->bufstart, dest->buflen, format, ap);
+}
+}
+
+dest->strlen = dest->bufused = len;
+
+return dest;
+}
+
 /*=for api string_native string_native_vtable
return the vtable for the native string
 */
@@ -92,6 +122,7 @@
string_native_concat,
string_native_chopn,
string_native_substr,
+string_native_nprintf
 };
 return sv;
 }
Index: docs/strings.pod
===
RCS file: /home/perlcvs/parrot/docs/strings.pod,v
retrieving revision 1.3
diff -u -r1.3 strings.pod
--- docs/strings.pod2001/09/13 08:39:49 1.3
+++ docs/strings.pod2001/09/17

Re: Patch to fix C++ style comments

2001-09-17 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
        Tom Hughes <[EMAIL PROTECTED]> wrote:

> In message <[EMAIL PROTECTED]>
> Simon Cozens <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Sep 13, 2001 at 09:35:33AM +0100, Tom Hughes wrote:
> > > BTW I have had to resend this because my first attempt was bounced
> > > apparently for having the patch as a text/plain attachment rather than
> > > inline. Isn't that a bit OTT though?
> > 
> > Hrm, I think other people have managed...
> 
> Wierd. Must be something to do with the MIME that Gnus created then.

I think I've worked this out...

The problem seems to be that Gnus doesn't bother adding a Content-Type
header to the sections of the multipart message on the grounds that
text/plain is the default content type, but the filters on the mailing
list obviously don't know that text/plain is the default.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: "Feature Freeze"

2001-09-20 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> So, if you're running on one of the core platforms, please check out a
> *clean* CVS copy, try and build and post the output of make test.

Tests cleanly on linux/x86:

perl t/harness
t/op/basic..ok, 1/2 skipped:  label constants unimplemented in assembler
t/op/integerok
t/op/number.ok, 2/23 skipped: various reasons
t/op/string.ok, 1/5 skipped:  I'm unable to write it!
t/op/trans..ok
All tests successful, 4 subtests skipped.
Files=5, Tests=74, 45 wallclock secs (38.60 cusr +  6.28 csys = 44.88 CPU)

Builds cleanly with -Wall with the exception of these warnings
in packfile.c:

packfile.c:964:3: warning: "/*" within comment
packfile.c:967:3: warning: "/*" within comment
packfile.c: In function `PackFile_unpack':
packfile.c:323: warning: int format, IV arg (arg 3)
packfile.c:344: warning: int format, IV arg (arg 3)
packfile.c:287: warning: unused variable `byte_code_ptr'
packfile.c:285: warning: unused variable `segment_ptr'
packfile.c: In function `PackFile_dump':
packfile.c:461: warning: unsigned int format, long unsigned int arg (arg 2)
packfile.c:474: warning: unsigned int format, long unsigned int arg (arg 2)
packfile.c:476: warning: unsigned int format, long unsigned int arg (arg 2)
packfile.c: In function `PackFile_ConstTable_dump':
packfile.c:938: warning: int format, IV arg (arg 2)
packfile.c: In function `PackFile_Constant_unpack':
packfile.c:1233: warning: unused variable `i'
packfile.c: In function `PackFile_Constant_dump':
packfile.c:1358: warning: unsigned int format, long unsigned int arg (arg 2)

The attached patch will clean up those warnings.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: packfile.c
===
RCS file: /home/perlcvs/parrot/packfile.c,v
retrieving revision 1.4
diff -u -r1.4 packfile.c
--- packfile.c  2001/09/20 21:41:40 1.4
+++ packfile.c  2001/09/20 22:41:46
@@ -180,7 +180,7 @@

 ***/

-void
+void
 PackFile_set_magic(PackFile * self, IV magic) {
 self->magic = magic;
 }
@@ -282,9 +282,7 @@

 IV
 PackFile_unpack(PackFile * self, char * packed, IV packed_size) {
-IV *   segment_ptr;
 IV segment_size;
-char * byte_code_ptr;
 char * cursor;
 IV *   iv_ptr;

@@ -317,9 +315,9 @@
 iv_ptr = (IV *)cursor;
 segment_size = *iv_ptr;
 cursor += sizeof(IV);
-
+
 if (segment_size % sizeof(IV)) {
-fprintf(stderr, "PackFile_unpack: Illegal fixup table segment size %d (must 
be multiple of %d!\n",
+fprintf(stderr, "PackFile_unpack: Illegal fixup table segment size %ld (must 
+be multiple of %d!\n",
 segment_size, sizeof(IV));
 return 0;
 }
@@ -338,13 +336,13 @@
 iv_ptr = (IV *)cursor;
 segment_size = *iv_ptr;
 cursor += sizeof(IV);
-
+
 if (segment_size % sizeof(IV)) {
-fprintf(stderr, "PackFile_unpack: Illegal constant table segment size %d 
(must be multiple of %d!\n",
+fprintf(stderr, "PackFile_unpack: Illegal constant table segment size %ld 
+(must be multiple of %d!\n",
 segment_size, sizeof(IV));
 return 0;
 }
-
+
 if (!PackFile_ConstTable_unpack(self->const_table, cursor, segment_size)) {
 fprintf(stderr, "PackFile_unpack: Error reading constant table segment!\n");
 return 0;
@@ -366,7 +364,7 @@
 self->byte_code_size = 0;
 return 0;
 }
-
+
 mem_sys_memcopy(self->byte_code, cursor, self->byte_code_size);
 }

@@ -432,7 +430,7 @@
 iv_ptr = (IV *)cursor;
 *iv_ptr = const_table_size;
 cursor += sizeof(IV);
-
+
 PackFile_ConstTable_pack(self->const_table, cursor);
 cursor += const_table_size;

@@ -458,7 +456,7 @@
 PackFile_dump(PackFile * self) {
 IV i;

-printf("MAGIC => 0x%08x,\n", self->magic);
+printf("MAGIC => 0x%08lx,\n", self->magic);

 printf("FIXUP => {\n");
 PackFile_FixupTable_dump(self->fixup_table);
@@ -471,9 +469,9 @@
 printf("BCODE => [");
 for (i = 0; i < self->byte_code_size / 4; i++) {
 if (i % 8 == 0) {
-printf("\n%08x:  ", i * 4);
+printf("\n%08lx:  ", i * 4);
 }
-printf("%08x ", ((IV *)(self->byte_code))[i]);
+printf("%08lx ", ((IV *)(self->byte_code))[i]);
 }
 printf("\n]\n");

@@ -837,7 +835,7 @@
 iv_ptr = (IV *)cursor;
 self->const_count = *iv_ptr;
 cursor += sizeof(IV);
-
+
 if (self->const_count == 0) {
 return 1;
 }
@@ -857,7 +855,7 @@

 cursor += PackFile_Constant_pack_size(self->constants[i]);
 }

Re: instructions per second benchmark (in parrot ;)

2001-09-20 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> That's actually what test.pasm tests. :) I just checked in a new version
> that prints labels.
>
> FWIW, my 600MHz Alpha clocks in at around 23M ops/sec. Nyah! ;-P

I have test.pasm reporting 7.14M ops/sec on a 200MHz K6 running
linux with the interpreter compiled -O3. That's about twice the
speed that I get without any optimisation.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: instructions per second benchmark (in parrot ;)

2001-09-21 Thread Tom Hughes

In message <20010920190703.S28291@blackrider>
Michael G. Schwern <[EMAIL PROTECTED]> wrote:

> I'm getting 2.67 MIPS with -O3.
> 
> Hmmm, why would a K6/200 come out so much faster than a G3/266?  If
> anything it should be the other way around.

No idea I'm afraid. I've just clocked 42.86M on an Athlon/1333 though ;-)

At the other end of the scale a P5/90 manages 2.91M ops/sec.

Taken together (and with the K6/200 time) that is something fairly
close to linear scaling with clock speed on x86 machines although
the K6/200 seems to be beating the odds a little.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: Have I given the big "The Way Strings Should Work" talk?

2001-10-21 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> I've given it a few places, but I don't know that I've sent it to
> perl6-internals. If not, or if I should do it again, let me know. I want to
> make sure we're all on the same page here.

Not that I recall. I thought that was what strings.pod was...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: PMCs and how the opcode functions will work

2001-10-21 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> I've now changed the vtable structure to reflect this, but I'd like someone
> to confirm that the "variant" forms of the ops can be addressed the way I
> think they can. (ie. structure->base_element + 1 to get "thing after
> base_element")

Legally speaking they can't as ISO C says that you can't do pointer
calculations and comparisons across object boundaries and separate
members of a structure are different objects. If you replace this:

set_integer_method_t set_integer_1;
set_integer_method_t set_integer_2;
set_integer_method_t set_integer_3;
set_integer_method_t set_integer_4;
set_integer_method_t set_integer_5;

with this:

set_integer_method_t set_integer[5];

then you would be able to, as an array is all one object.

Practically speaking I think it will work on every system that I can
think of at the moment but who knows what wierd things are out there...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] Bugfix for push_generic_entry

2001-10-21 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Jason Gloudon <[EMAIL PROTECTED]> wrote:

> The "stacktest" patch will fail on the current CVS source, due to a bug in
> push_generic_entry.

This looks good to me so I have committed it. Thanks for spotting it!

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Resync your CVS...

2001-10-22 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> On Mon, 22 Oct 2001, Sam Tregar wrote:
> 
> > Fresh checkout won't compile on Redhat Linux 7.1:
> 
> Damn. It compiled cleanly before I checked it in. I'll patch up again and
> see what I missed. Probably some odd dependency or timing issue
> somewhere. (It's emacs fault! Yeah, that's the ticket! :)

I'd already patched it up, so I've just committed my fix...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String rationale

2001-10-25 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> =item type
> 
> What the character set or type of data is encoded in the buffer. This
> includes things like ASCII, EBCDIC, Unicode, Chinese Traditional,
> Chinese Simplified, or Shift-JIS. (And yes, I know the latter's a
> combination of type and encoding. I'll update the doc as soon as I can
> reasonablty separate the two)

Isn't this going to need to be a vtable pointer like encoding is? Only
some things (like character classification and at least some transcoding
tasks) will be character set based rather than encoding based.

Other than that it looked quite good and I'll probably start looking at
bending the existing code into the new model over the weekend.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Ooops, sorry for that blank log message.

2001-10-25 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Brian Wheeler <[EMAIL PROTECTED]> wrote:

> Darn it, I fat fingered the log message.
> 
> This is a fix which changes the way op variants are handled.  The old
> method "forgot" the last variant, so thing(i,i|ic,i|ic) would
> generate:
> thing(i,i,i)
> thing(i,i,ic)
> thing(i,ic,i)
> 
> but not
> 
> thing(i,ic,ic)

It didn't forget it, it went to some considerable trouble to
ignore it on the grounds that such an opcode is pointless as
alll the operands are constant.

I did describe the algorithm used and the logic behind it on the
list when I implemented it.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String rationale

2001-10-27 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
      Tom Hughes <[EMAIL PROTECTED]> wrote:

> Other than that it looked quite good and I'll probably start looking at
> bending the existing code into the new model over the weekend.

Attached is my first pass at this - it's not fully ready yet but
is something for people to cast an eye over before I spend lots of
time going down the wrong path ;-)

The encoding_lookup() and chartype_lookup() routines will obviously
need to load the relevant libraries on the fly when we have support
for that.

The packfile stuff is just a hack to make it work for now. Presumably
we will have to modify the byte code format to record the string types
as names or something so we can look them up properly?

String comparison is not language sensitive here - as before it just
compares based on character values.

Other than that I think it's aiming in the right direction and it does
pass all the tests... Please correct me if I'm wrong.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

# This is a patch for parrot to update it to parrot-ns
# 
# To apply this patch:
# STEP 1: Chdir to the source directory.
# STEP 2: Run the 'applypatch' program with this patch file as input.
#
# If you do not have 'applypatch', it is part of the 'makepatch' package
# that you can fetch from the Comprehensive Perl Archive Network:
# http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz
# In the above URL, 'x' should be 2 or higher.
#
# To apply this patch without the use of 'applypatch':
# STEP 1: Chdir to the source directory.
# If you have a decent Bourne-type shell:
# STEP 2: Run the shell with this file as input.
# If you don't have such a shell, you may need to manually create/delete
# the files/directories as shown below.
# STEP 3: Run the 'patch' program with this file as input.
#
# These are the commands needed to create/delete files/directories:
#
mkdir 'chartypes'
chmod 0755 'chartypes'
mkdir 'encodings'
chmod 0755 'encodings'
rm -f 'transcode.c'
rm -f 'strutf8.c'
rm -f 'strutf32.c'
rm -f 'strutf16.c'
rm -f 'strnative.c'
rm -f 'include/parrot/transcode.h'
rm -f 'include/parrot/strutf8.h'
rm -f 'include/parrot/strutf32.h'
rm -f 'include/parrot/strutf16.h'
rm -f 'include/parrot/strnative.h'
touch 'chartype.c'
chmod 0644 'chartype.c'
touch 'chartypes/unicode.c'
chmod 0644 'chartypes/unicode.c'
touch 'chartypes/usascii.c'
chmod 0644 'chartypes/usascii.c'
touch 'encoding.c'
chmod 0644 'encoding.c'
touch 'encodings/singlebyte.c'
chmod 0644 'encodings/singlebyte.c'
touch 'encodings/utf16.c'
chmod 0644 'encodings/utf16.c'
touch 'encodings/utf32.c'
chmod 0644 'encodings/utf32.c'
touch 'encodings/utf8.c'
chmod 0644 'encodings/utf8.c'
touch 'include/parrot/chartype.h'
chmod 0644 'include/parrot/chartype.h'
touch 'include/parrot/encoding.h'
chmod 0644 'include/parrot/encoding.h'
#
# This command terminates the shell and need not be executed manually.
exit
#
 End of Preamble 

 Patch data follows 
diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST'
Index: ./MANIFEST
*** ./MANIFEST  Wed Oct 24 22:16:51 2001
--- ./MANIFEST  Sat Oct 27 14:59:43 2001
***
*** 1,5 
--- 1,8 
  assemble.pl
  ChangeLog
+ chartype.c
+ chartypes/unicode.c
+ chartypes/usascii.c
  classes/genclass.pl
  classes/intclass.c
  config_h.in
***
*** 14,19 
--- 17,27 
  docs/parrotbyte.pod
  docs/strings.pod
  docs/vtables.pod
+ encoding.c
+ encodings/singlebyte.c
+ encodings/utf8.c
+ encodings/utf16.c
+ encodings/utf32.c
  examples/assembly/bsr.pasm
  examples/assembly/call.pasm
  examples/assembly/euclid.pasm
***
*** 29,34 
--- 37,44 
  global_setup.c
  hints/mswin32.pl
  hints/vms.pl
+ include/parrot/chartype.h
+ include/parrot/encoding.h
  include/parrot/events.h
  include/parrot/exceptions.h
  include/parrot/global_setup.h
***
*** 45,55 
  include/parrot/runops_cores.h
  include/parrot/stacks.h
  include/parrot/string.h
- include/parrot/strnative.h
- include/parrot/strutf16.h
- include/parrot/strutf32.h
- include/parrot/strutf8.h
- include/parrot/transcode.h
  include/parrot/trace.h
  include/parrot/unicode.h
  interpreter.c
--- 55,60 
***
*** 107,116 
  runops_cores.c
  stacks.c
  string.c
- strnative.c
- strutf16.c
- strutf32.c
- strutf8.c
  test_c.in
  test_main.c
  Test/More.pm
--- 112,117 
***
*** 128,134 
  t/op/time.t
  t/op/trans.t
  trace.c
- transcode.c
  Types_pm.in
  vtable_h.pl
  vtable.tbl
--- 129,134 
diff -c

Re: String rationale

2001-10-27 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
      Tom Hughes <[EMAIL PROTECTED]> wrote:

> Attached is my first pass at this - it's not fully ready yet but
> is something for people to cast an eye over before I spend lots of
> time going down the wrong path ;-)

Before anybody else spots, let me just add what I forget to mention
in my original post, which is that transcoding isn't implemented yet
as I'm still thinking about the best way to do it. There is a hook
in place ready for it though.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Opcode complaints

2001-10-28 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  "Brent Dax" <[EMAIL PROTECTED]> wrote:

> 4. eq and friends: string variants
> One thing that seems to be missing is string and numeric variants on the
> comparison ops.  While this isn't a problem now, it may be once we get
> PMCs.

Both string and numeric versions of the comparison ops exist...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String rationale

2001-10-29 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote:
>
> >Attached is my first pass at this - it's not fully ready yet but
> >is something for people to cast an eye over before I spend lots of
> >time going down the wrong path ;-)
> 
> It looks pretty good on first glance.

I've done a bit more work now, and the latest version is attached.

This version can do transcoding. The intention is that there will be
some sort of cache in chartype_lookup_transcoder to avoid repeating
the expensive lookups by name too much.

One interesting question is who is responsible for transcoding
from character set A to character set B - is it A or B? and how
about the other way?

My code currently allows either set to provide the transform on the
grounds that otherwise the unicode module would have to either know
how to convert to everything else or from everything else.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

# This is a patch for parrot to update it to parrot-ns
# 
# To apply this patch:
# STEP 1: Chdir to the source directory.
# STEP 2: Run the 'applypatch' program with this patch file as input.
#
# If you do not have 'applypatch', it is part of the 'makepatch' package
# that you can fetch from the Comprehensive Perl Archive Network:
# http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz
# In the above URL, 'x' should be 2 or higher.
#
# To apply this patch without the use of 'applypatch':
# STEP 1: Chdir to the source directory.
# If you have a decent Bourne-type shell:
# STEP 2: Run the shell with this file as input.
# If you don't have such a shell, you may need to manually create/delete
# the files/directories as shown below.
# STEP 3: Run the 'patch' program with this file as input.
#
# These are the commands needed to create/delete files/directories:
#
mkdir 'chartypes'
chmod 0755 'chartypes'
mkdir 'encodings'
chmod 0755 'encodings'
rm -f 'transcode.c'
rm -f 'strutf8.c'
rm -f 'strutf32.c'
rm -f 'strutf16.c'
rm -f 'strnative.c'
rm -f 'include/parrot/transcode.h'
rm -f 'include/parrot/strutf8.h'
rm -f 'include/parrot/strutf32.h'
rm -f 'include/parrot/strutf16.h'
rm -f 'include/parrot/strnative.h'
touch 'chartype.c'
chmod 0644 'chartype.c'
touch 'chartypes/unicode.c'
chmod 0644 'chartypes/unicode.c'
touch 'chartypes/usascii.c'
chmod 0644 'chartypes/usascii.c'
touch 'encoding.c'
chmod 0644 'encoding.c'
touch 'encodings/singlebyte.c'
chmod 0644 'encodings/singlebyte.c'
touch 'encodings/utf16.c'
chmod 0644 'encodings/utf16.c'
touch 'encodings/utf32.c'
chmod 0644 'encodings/utf32.c'
touch 'encodings/utf8.c'
chmod 0644 'encodings/utf8.c'
touch 'include/parrot/chartype.h'
chmod 0644 'include/parrot/chartype.h'
touch 'include/parrot/encoding.h'
chmod 0644 'include/parrot/encoding.h'
#
# This command terminates the shell and need not be executed manually.
exit
#
 End of Preamble 

 Patch data follows 
diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST'
Index: ./MANIFEST
*** ./MANIFEST  Sun Oct 28 17:11:21 2001
--- ./MANIFEST  Sun Oct 28 17:11:07 2001
***
*** 1,5 
--- 1,8 
  assemble.pl
  ChangeLog
+ chartype.c
+ chartypes/unicode.c
+ chartypes/usascii.c
  classes/genclass.pl
  classes/intclass.c
  classes/scalarclass.c
***
*** 15,20 
--- 18,28 
  docs/parrotbyte.pod
  docs/strings.pod
  docs/vtables.pod
+ encoding.c
+ encodings/singlebyte.c
+ encodings/utf8.c
+ encodings/utf16.c
+ encodings/utf32.c
  examples/assembly/bsr.pasm
  examples/assembly/call.pasm
  examples/assembly/euclid.pasm
***
*** 30,35 
--- 38,45 
  global_setup.c
  hints/mswin32.pl
  hints/vms.pl
+ include/parrot/chartype.h
+ include/parrot/encoding.h
  include/parrot/events.h
  include/parrot/exceptions.h
  include/parrot/global_setup.h
***
*** 46,56 
  include/parrot/runops_cores.h
  include/parrot/stacks.h
  include/parrot/string.h
- include/parrot/strnative.h
- include/parrot/strutf16.h
- include/parrot/strutf32.h
- include/parrot/strutf8.h
- include/parrot/transcode.h
  include/parrot/trace.h
  include/parrot/unicode.h
  interpreter.c
--- 56,61 
***
*** 108,117 
  runops_cores.c
  stacks.c
  string.c
- strnative.c
- strutf16.c
- strutf32.c
- strutf8.c
  test_c.in
  test_main.c
  Test/More.pm
--- 113,118 
***
*** 129,135 
  t/op/time.t
  t/op/trans.t
  trace.c
- transcode.c
  Types_pm.in
  vtable_h.pl
  vtable.tbl
--- 130,135 
diff -c &

RE: String rationale

2001-10-29 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  "Stephen Howard" <[EMAIL PROTECTED]> wrote:

> right.  I had just keyed in on this from Tom's message:
> 
> "My code currently allows either set to provide the transform on the
> grounds that otherwise the unicode module would have to either know
> how to convert to everything else or from everything else."
> 
> ...which seemed to posit that Unicode module could be responsible for
> all the transcodings to and from it's own character set, which seemed
> backwards to me.

I was only positing it long enough to acknowledge that such a rule
was untenable.

What it comes down to is that there are three possibles rules, namely:

  1. Each character set defines transforms from itself to other
 character sets.

  2. Each character set defines transforms to itself from other
 character sets.

  3. Each character set defines transforms both from itself to
 other character sets and from other character sets to itself.

We have established that the first two will not work because of the
unicode problem.

That leaves the third, which is what I have implemented. When looking to
transcode from A to B it will first ask A if can it transcode to B and
if that fails then it will ask B if it can transcode from A.

That way each character set can manage it's own translations both to
and from unicode as we require.

The problem it raises is, whois reponsible for transcoding from ASCII to
Latin-1? and back again? If we're not careful both ends will implement
both translations and we will have effective duplication.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String rationale

2001-10-29 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  James Mastros <[EMAIL PROTECTED]> wrote:

> > That leaves the third, which is what I have implemented. When looking to
> > transcode from A to B it will first ask A if can it transcode to B and
> > if that fails then it will ask B if it can transcode from A.
> I propose another variant on this:
> If that fails, it asks A to transcode to Unicode, and B to transcode from
> Unicode.  (Not Unicode to transcode to B; Unicode implements no transcodings.)

My code does that, though at a slightly higher level. If you look
at string_transcode() you will see that if it can't find a direct
mapping it will go via unicode. If C had closures then I'd have
buried that down in the chartype_lookup_transcoder() layer, but it
doesn't so I couldn't ;-)

> > The problem it raises is, whois reponsible for transcoding from ASCII to
> > Latin-1? and back again? If we're not careful both ends will implement
> > both translations and we will have effective duplication.
> 1) Neither.  Each must support transcoding to and from Unicode.

Absolutely.

> 2) But either can support converting directly if it wants.

The danger is that everybody tries to be clever and support direct
conversion to and from as many other character sets as possible, which
leads to lots of duplication.

> I also think that, for efficency, we might want a "7-bit chars match ASCII"
> flag, since most charactersets do, and that means that we don't have to deal
> with the overhead for strings that fit in 7 bits.  This smells of premature
> optimization, though, so sombody just file this away in their heads for
> future reference.

I have already been thinking about this although it does get more
complicated as you have to consider the encoding as well - if you
have a single byte encoded ASCII string then transcoding to a single
byte encoded Latin-1 string is a no-op, but that may not be true for
other encodings if such a thing makes sense for those character types.

> (BTW, for those paying attention, I'm waiting on this discussion for my
> chr/ord patch, since I want them in terms of charsets, not encodings.)

I suspect that the encode and decode methods in the encoding vtable
are enough for doing chr/ord aren't they?

Surely chr() is just encoding the argument in the chosen encoding (which
can be the default encoding for the char type if you want) and then setting
the type and encoding of the resulting string appropriately.

Equally ord() is decoding the first character of the string to get a
number.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String rationale

2001-10-30 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
James Mastros <[EMAIL PROTECTED]> wrote:

> On Mon, Oct 29, 2001 at 11:20:47PM +0000, Tom Hughes wrote:
> 
> > I suspect that the encode and decode methods in the encoding vtable
> > are enough for doing chr/ord aren't they?
>
> Hmm... come to think of it, yes.  chr will always create a utf32-encoded
> string with the given charset number (or unicode for the two-arg version),
> ord will return the codepoint within the current charset.

I hope it will create a string with the given charset number and
using the default encoding for that charset.

Asking for an ASCII character and getting it UTF-32 encoded would
be more that a little bizarre. If I say chr(65,ASCII) then I would
expect to get a single byte encoded string...

> (This, BTW, means that only encodings that feel like it have to provide
> either, but all encodings must be able to convert to utf32.)

The way I've written it, any encoding can convert to any encoding
at all, because there is no conversion at that level. I just decode
a character from the source, transcode it at the character level, and
then encode it to the destination.

If an encoding cannot handle the full range of character values for
a character set then you will get an exception when it tries to encode
an out of range character.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: String rationale

2001-10-31 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
      Tom Hughes <[EMAIL PROTECTED]> wrote:

> In message <[EMAIL PROTECTED]>
>   Dan Sugalski <[EMAIL PROTECTED]> wrote:
> 
> > At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote:
> >
> > >Attached is my first pass at this - it's not fully ready yet but
> > >is something for people to cast an eye over before I spend lots of
> > >time going down the wrong path ;-)
> > 
> > It looks pretty good on first glance.
> 
> I've done a bit more work now, and the latest version is attached.

Unless anybody has objections I plan to commit this work shortly...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String rationale

2001-11-01 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
Simon Cozens <[EMAIL PROTECTED]> wrote:

> On Sat, Oct 27, 2001 at 04:23:48PM +0100, Tom Hughes wrote:
> > The encoding_lookup() and chartype_lookup() routines will obviously
> > need to load the relevant libraries on the fly when we have support
> > for that.
> 
> Could you try rewriting them using an enum, like the vtable stuff and
> the original string encoding stuff does?

The intention is that when an encoding or character type is loaded it
will be allocated a unique ID number that can be used internally to
refer to it, but that the number will only valid for the duration of
that instance of parrot rather than being persistent. That's certainly
the way Dan described it happening in his rationale which is what my
code is based on.

Allocating them globally is not possible if we're going allow people
to add arbitrary encodings and character sets - as things stand adding
the foo encoding will be as simple as adding foo.so to the encodings
directory.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: String rationale

2001-11-01 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
Simon Cozens <[EMAIL PROTECTED]> wrote:

> As things stand, that won't work, because you're doing a string lookup in one
> of the core functions, and you still need some way of registering incoming
> stuff. With an enum, you can keep hold of a fake encoding_max, and hand
> encoding_max++ to the initialisation function for each encoding.

Well there won't be any point in it being an enum rather that an 
integer unless some of them are going to be preallocated. I'm not
sure if the encoding and character types will need to know their
own index numbers but if we do then they can be told at initialisation
time, yes.

I absolutely intend that the current hard coded strings in the core
will go away in due course though. When you look up an encoding or
character type by name it will first check a hash table or something
to see if it is already loaded and if not it will look for it on disk
and load it in, allocate it a number, and add it to the hash table
for future reference.

Hence the current strcmp junk in the lookup functions will go away.

In much the same way the byte code will have some sort of table of
names which it will look up as it is loaded rather than the current
hard coding of name to number mappings in the byte code.

So all I need now to make all this work is hash tables and dynamic
code loading ;-) Any volunteers...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: [PATCH] Computed goto, super-fast dispatching.

2001-11-04 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Daniel Grunblatt <[EMAIL PROTECTED]> wrote:

> All:
>   Here's a list of the things I've been doing:
> 
> * Added ops2cgc.pl which generates core_cg_ops.c and core_cg_ops.h from
> core.ops, and modified Makefile.in to use it. In core_cg_ops.c resides
> cg_core which has an array with the addresses of the label of each opcode
> and starts the execution "jumping" to the address in array[*cur_opcode].
> 
> * Modified interpreter.c to include core_cg_ops.h
> 
> * Modified runcore_ops.c to discard the actual dispatching method and call
> cg_core, but left everything else untouched so that -b,-p and -t keep
> working.
> 
> * Modified pbc2c.pl to use computed goto when handling jump or ret, may be
> I can modified this once again not to define the array with the addresses
> if it's not going to be used but I don't think that in real life a program
> won't use jump or ret, am I right?
> 
> Hope some one find this usefull.

I just tried it but I don't seem to be seeing anything like the speedups
you are. All the times which follow are for a K6-200 running RedHat 7.2 and
compiled -O6 with gcc 2.96.

Without patch:

  gosford [~/src/parrot] % ./test_prog examples/assembly/mops.pbc
  Iterations:1
  Estimated ops: 3
  Elapsed time:  37.387179
  M op/s:8.024141

  gosford [~/src/parrot] % ./examples/assembly/mops
  Iterations:1
  Estimated ops: 3
  Elapsed time:  3.503482
  M op/s:85.629098

With patch:

  gosford [~/src/parrot-cg] % ./test_prog examples/assembly/mops.pbc
  Iterations:1
  Estimated ops: 3
  Elapsed time:  29.850361
  M op/s:10.050130

  gosford [~/src/parrot-cg] % ./examples/assembly/mops
  Iterations:1
  Estimated ops: 3
  Elapsed time:  4.515596
  M op/s:66.436413

So there is a small speed up for the interpreted version, but nothing
like the three times speedup you had. The compiled version has actually
managed to get slower...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] Computed goto, super-fast dispatching.

2001-11-04 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Daniel Grunblatt <[EMAIL PROTECTED]> wrote:

> Yeap, I was right, using gcc 3.0.2 you can see the difference:

I've just tried it with 3.0.1 and see much the same results as I did
with 2.96 I'm afraid. I don't have 3.0.2 to hand without building it
from source so I haven't tried that as yet.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] Computed goto, super-fast dispatching.

2001-11-05 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
Daniel Grunblatt <[EMAIL PROTECTED]> wrote:

> Do you want me to give you an account in my linux machine where I have
> install gcc 3.0.2 so that you see it?

I'm not sure that will achieve anything - it's not that I don't
believe you, it's just that I'm not seeing the same thing.

I have now tried on a number of other machines, and the results
are summarised in the following table:

   Standard Computed Gotos
   Interpreted   CompiledInterpreted   Compiled
A  3.3533.56 4.63 (+38%)  29.83 (-11%)
B  5.6985.2414.08 (+147%) 78.60 (-8%)
C 15.09   314.9131.83 (+111%)259.34 (-18%)
D 45.87   774.7362.37 (+36%) 795.30 (+3%)

Machine A is a 90Mhz Pentium running RedHat 7.1 with gcc 2.96
Machine B is a Dual 200Mhz Pentium-Pro running RedHat 6.1 with egcs 1.1.2
Machine C is a 733Mhz Pentium III running FreeBSD 4.3-STABLE with gcc 2.95.3
Machine D is an 1333Mhz Athlon running RedHat 7.1 with gcc 2.96

Clearly the speedup varies significantly between systems with some
giving much greater improvements than others.

One other thing that I did notice is that there is quite a bit of
fluctuation between runs on some of the machines, possibly because
we are measuring real time and not CPU time.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: [PATCHES] concat, read, substr, added 'ord' operator, and a SURPRISE

2001-11-13 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 03:35 AM 11/11/2001 -0500, James Mastros wrote:
>
> >No, it isn't.  I'm not sure s->strlen is always gaurnteed to be correct;
> >string_length(s) is.  (I found a case where it was wrong when coding my
> >version of ord() once, though that ended up being a problem with my
> >version of chr().  The point is that string_length is an API, but the
> >contents of the struct are not.)
> 
> We shouldn't cheat--the string length field should be considered a black
> box until we need the speed, at which point we play Macro Games and change
> string_length into a direct fetch.

As far as I know the strlen member should always be correct. I was
certainly trying to make sure it was because strings.pod explictly
says that it will be and that it can be used directly instead of
calling string_length().

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCHES] ord(i,s|sc(,i|ic)?) operator committed, fixed bug in concat()

2001-11-13 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Jeff <[EMAIL PROTECTED]> wrote:

> string.c - Added string_ord() and a _string_index() helper function to
> help making accommodating different encodings easier. Patched concat()
> to deal with null strings.

I have just committed an amendment to this to make string_index use
the encoding routines instead of assuming a single byte encoding.

I have also renamed _string_index to string_index as function names
that start with an underscore are reserved to implementors by the C
standard.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Butt-ugliness reduction

2001-11-15 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Michael L Maraist <[EMAIL PROTECTED]> wrote:

> inlined c-functions.. Hmm, gcc has some support for this, but what about
> other archectures.. For function-inlining to work with GCC, you have to
> define the function in the header.. That's definately not portable.  I guess
> you're saying that the inlined functions would be the same .c file as it's
> being used.. Well, I thought these classes might span multiple files, making
> that rather difficult.

You only need to define it in the header if it needs to be visible
across more than one file - if it is only needed in the file that is
implrmenting the scalar class then it can be put there.

In fact many compilers will inline small static functions anyway
even without an explicit hint in the source.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] Moving NV constants to the constant table

2001-09-29 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote:

> Let me know how this works for you...

There seems to be a lot of the patch missing:

gosford [~/src/parrot-nvconst] % patch -N < /tmp/nvconst.patch
patching file Makefile.in
patching file Types_pm.in
patching file assemble.pl
patch:  unexpected end of file in patch

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] (AGAIN) NV constants in constant table

2001-09-29 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote:

> There was trouble with the attachment on my last post, so here it
> comes again...

That patches and builds OK but the added files are not in the
patch so Parrot/Assembler.pm at least is missing and this I can't
run any tests.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] (AGAIN) NV constants in constant table

2001-09-29 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote:

> Sorry about that, Tom. I really need to add -N to my .cvsrc...
> I just sent the (hopefully) complete patch to the list. Please try
> it out against a fresh checkout and let me know how it works for
> you...

It builds and tests cleanly for me now (linux/x86).

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

RE: [PATCH] non-init var possibility

2001-10-06 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote:

> No, the behavior of malloc(0) is implementation defined.

It is, yes, but there are only two legal results according to
the ISO C standard:

"If the size of the space requested is zero, the behavior is
 implementation-defined: either a null pointer is returned, or
 the behavior is as if the size were some nonzero value, except
 that the returned pointer shall not be used to access an object.

In other words it can't crash or do anything else undesirable, and
the result will always be something that can't be dereferenced, but
can be freed (given that the standard requires free(NULL) to work).

Given that, although we can't say the behaviour is strictly speaking
consistent it is true that as far as performing normal operations on
the pointer go you are unlikely to notice which behaviour a given
platform has chosen.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Transcoding patch

2001-10-07 Thread Tom Hughes


The attached patch is a first stab at implementing string transcoding
and the unicode string types.

The transcoder will currently only map one UTF type to another - there
is no attempt to implement mapping to or from native strings as I wasn't
sure what the plan was for that. Presumably we will have to determine
what the native character set is at configure time and then generate
some code to map between that and unicode somehow?

There are currently no proper tests because there is no way to generate
anything other than a native string using the current assembler. There is
a small C test harness (trans-test.c) which I have used to validate the
transcoder to a certain extent.

This patch also fixes a bug in the existing native strings where
string_native_compute_strlen was returning the number of bytes that
had been allocated rather than the number that were in use.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


diff -urNw --exclude CVS parrot/Makefile.in parrot-transcode/Makefile.in
--- parrot/Makefile.in  Sun Oct  7 15:58:56 2001
+++ parrot-transcode/Makefile.inSun Oct  7 16:08:49 2001
@@ -4,7 +4,7 @@
 INC=include/parrot
 H_FILES = $(INC)/config.h $(INC)/exceptions.h $(INC)/io.h $(INC)/op.h 
$(INC)/register.h $(INC)/string.h $(INC)/events.h $(INC)/interpreter.h $(INC)/memory.h 
$(INC)/parrot.h $(INC)/stacks.h $(INC)/packfile.h $(INC)/global_setup.h $(INC)/vtable.h
 
-O_FILES = global_setup$(O) interpreter$(O) parrot$(O) register$(O) basic_opcodes$(O) 
memory$(O) packfile$(O) string$(O) strnative$(O)
+O_FILES = global_setup$(O) interpreter$(O) parrot$(O) register$(O) basic_opcodes$(O) 
+memory$(O) packfile$(O) string$(O) strnative$(O) strutf8$(O) strutf16$(O) 
+strutf32$(O) transcode$(O)
 
 #DO NOT ADD C COMPILER FLAGS HERE
 #Add them in Configure.pl--look for the
@@ -32,8 +32,8 @@
 $(TEST_PROG): test_main$(O) $(O_FILES) interp_guts$(O) op_info$(O)
$(CC) $(CFLAGS) -o $(TEST_PROG) $(O_FILES) interp_guts$(O) op_info$(O) 
test_main$(O) $(C_LIBS)

-$(PDUMP): pdump$(O) packfile$(O) memory$(O) global_setup$(O) string$(O) strnative$(O)
-   $(CC) $(CFLAGS) -o $(PDUMP) pdump$(O) packfile$(O) memory$(O) global_setup$(O) 
string$(O) strnative$(O) $(C_LIBS)
+$(PDUMP): pdump$(O) packfile$(O) memory$(O) global_setup$(O) string$(O) strnative$(O) 
+strutf8$(O) strutf16$(O) strutf32$(O) transcode$(O)
+   $(CC) $(CFLAGS) -o $(PDUMP) pdump$(O) packfile$(O) memory$(O) global_setup$(O) 
+string$(O) strnative$(O) strutf8$(O) strutf16$(O) strutf32$(O) transcode$(O) $(C_LIBS)
 
 test_main$(O): $(H_FILES) $(INC)/interp_guts.h
 
@@ -42,6 +42,14 @@
 string$(O): $(H_FILES)
 
 strnative$(O): $(H_FILES)
+
+strutf8$(O): $(H_FILES)
+
+strutf16$(O): $(H_FILES)
+
+strutf32$(O): $(H_FILES)
+
+transcode$(O): $(H_FILES)
 
 $(INC)/interp_guts.h interp_guts.c $(INC)/op_info.h op_info.c: opcode_table 
build_interp_starter.pl
$(PERL) build_interp_starter.pl
diff -urNw --exclude CVS parrot/global_setup.c parrot-transcode/global_setup.c
--- parrot/global_setup.c   Sun Sep 16 12:32:21 2001
+++ parrot-transcode/global_setup.c Sat Oct  6 15:43:20 2001
@@ -17,6 +17,7 @@
 void
 init_world() {
 string_init(); /* Set up the string subsystem */ 
+transcode_init(); /* Set up the transcoding subsystem */
 }
 
 /*
diff -urNw --exclude CVS parrot/include/parrot/exceptions.h 
parrot-transcode/include/parrot/exceptions.h
--- parrot/include/parrot/exceptions.h  Mon Sep 24 22:40:32 2001
+++ parrot-transcode/include/parrot/exceptions.hSun Oct  7 15:36:46 2001
@@ -17,6 +17,9 @@
 
 #define NO_REG_FRAMES 1
 #define SUBSTR_OUT_OF_STRING 1
+#define MALFORMED_UTF8 1
+#define MALFORMED_UTF16 1
+#define MALFORMED_UTF32 1
 
 #endif
 
diff -urNw --exclude CVS parrot/include/parrot/parrot.h 
parrot-transcode/include/parrot/parrot.h
--- parrot/include/parrot/parrot.h  Sat Oct  6 15:10:50 2001
+++ parrot-transcode/include/parrot/parrot.hSun Oct  7 15:21:57 2001
@@ -66,6 +66,7 @@
 
 #include "parrot/global_setup.h"
 #include "parrot/string.h"
+#include "parrot/transcode.h"
 #include "parrot/vtable.h"
 #include "parrot/interpreter.h"
 #include "parrot/register.h"
diff -urNw --exclude CVS parrot/include/parrot/string.h 
parrot-transcode/include/parrot/string.h
--- parrot/include/parrot/string.h  Tue Oct  2 22:02:00 2001
+++ parrot-transcode/include/parrot/string.hSun Oct  7 15:21:46 2001
@@ -85,6 +85,9 @@
 VAR_SCOPE STRING_VTABLE Parrot_string_vtable[enc_max];
 
 #include "parrot/strnative.h"
+#include "parrot/strutf8.h"
+#include "parrot/strutf16.h"
+#include "parrot/strutf32.h"
 #endif
 
 /*
diff -urNw --exclude CVS parrot/include/parrot/strutf16.h 
parrot-transcode/include/parrot/strutf16.h
--- parrot/include/parrot/strutf16.hThu Jan  1 01:00:00 1970
+++ parrot-transcode/include/parrot/strutf16.h  Sun Oct  7 15:21:02 2001
@@ -0,0 +1,29 @@
+/* strutf16.h
+ *  Copyri

RE: Transcoding patch

2001-10-07 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote:

> This is good, unless someone has objections I'll commit this.  However, we
> also need the ability to do unicode in the assembler (I'll do this later
> today if no one beats me to it), and we need some way to communicate the
> encoding number between the C and the Perl code.

It probably does still need some cleaning up but that can be done
incremently. One of the main things that I wasn't sure about but
forgot to mention in the original message is what we want to do
about malformed strings.

Are we going to assume strings are well formed and go hell for
leather in handling them or do we want to move to the paranoid
end of the spectrum and check everything we do and throw exceptions
when something odd is spotted?

Currently the code does a bit of both - sometimes it checks things
and sometimes it doesn't.

> I guess the question with native strings is will it always be ASCII or will
> it be Shift-JIS etc...?  And the follow up to that is can, for the short
> term, we assume it will be ASCII and then improve our native string
> transcoding over time?

Well according to string.pod native will always be a single byte per
character encoding and never a wide character or shifted encoding so
that rules out Shift-JIS and most other far eastern encodings.

BTW the claim in string.pod that UTF-8 needs a maximum of 3 bytes per
character is wrong, at least if you allow U+ to U+10 as your
character space which is what I did - any character over U+ needs
four bytes.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Transcoding patch

2001-10-08 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
Gibbs Tanton <[EMAIL PROTECTED]> wrote:

> I've applied this patch.

I just did an update and noticed the new files had appeared about
two seconds before your mail arrived ;-)

> I realize that we have a ways to go before we can fully support unicode, but
> I felt that this patch was a big step in the right direction; with it
> committed we can now start incrementally cleaning it up and making it work
> correctly.  Since it doesn't affect anything we are working on it shouldn't
> get in the way at all.

Absolutely. A few other issues that I remembered last night are:

  - The current code assumes that the string data will be two
byte aligned for UTF-16 and four byte aligned for UTF-32 which
is probably reasonable but maybe not.

  - The utf8_t, utf16_t and utf32_t types will need to be determined
by configure as they will currently break on some machines. Plus
machines without native 8, 16 and 32 bit types will be a problem.

  - There are byte ordering issues for UTF-16 and UTF-32 strings. The
current code assumes host byte ordering but should we be spotting
byte order markers in the strings and adjusting to cope?

> We do need to figure out how to change from unicode to native.  We also need
> to make sure that we don't hardcode the encoding in the assembler, the
> assembler should be able to get what encoding to use from a file.

A fundamental question (which I think Simon was hinting at with his
cryptic comment) is whether the native encoding is fixed when parrot
is built or can change on the fly as they user changes their locale
settings. If it's the latter than conversion to and from native will
have to work by loading an appropriate conversion table at run time.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: Transcoding patch

2001-10-08 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
Gibbs Tanton <[EMAIL PROTECTED]> wrote:

> >  - The utf8_t, utf16_t and utf32_t types will need to be determined
> >by configure as they will currently break on some machines. Plus
> >machines without native 8, 16 and 32 bit types will be a problem.
> 
> Almost all hardware should have char as an 8 bit type so that shouldn't be a
> problem.  However, finding a 16 bit or 32 bit type might be a problem on
> some hardware.  We might want to think about using arrays of 8 bit types or
> using bit fields.

The Cray was the canonical example of a problem machine that I had
in mind - if I recall correctly even char is 8 bytes there isn't it?

Bit fields are no use as you can't have a pointer to a bit field.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

RE: Transcoding patch

2001-10-08 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote:

> This is good, unless someone has objections I'll commit this.  However, we
> also need the ability to do unicode in the assembler (I'll do this later
> today if no one beats me to it), and we need some way to communicate the
> encoding number between the C and the Perl code.

The attached patch solves the assembler issue by allowing quoted
strings to be prefixed with U8, U16 or U32 to indicate a unicode
string of the appropriate type, so:

  set_s_sc S1, U8"Hello World"

creates a UTF-8 string in S1 containg the specified data. I don't
particularly like that syntax so if anybody has any better ideas
then please say... Most of the patch is useful whatever the syntax
though - it will just need tweaking to recognise the appropriate
syntax.

The patch also adds support for \x escapes in strings as it is
difficult to write unicode string constants without that.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: Assembler.pm
===
RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v
retrieving revision 1.7
diff -u -w -r1.7 Assembler.pm
--- Assembler.pm2001/10/06 05:21:16 1.7
+++ Assembler.pm2001/10/08 23:46:07
@@ -270,6 +270,17 @@
'__LINE__' => sub { return $line },
'__FILE__' => sub { return "\"$file\"" });
 
+
+###
+
+=head2 %encodings
+
+maps string prefixes to encodings.
+
+=cut
+
+my %encodings=('' => 0, 'U8' => 1, 'U16' => 2, 'U32' => 3);
+
 my %opcodes = Parrot::Opcode::read_ops( -f "../opcode_table" ? "../opcode_table" : 
"opcode_table" );
 
 
@@ -487,7 +498,7 @@
   # now emit each constant
   my $counter = 0;
   for( @constants ) {
-my ($type, $value) = @$_;
+my ($type, $value, $encoding) = @$_;
 
 add_line_to_listing( sprintf( "\t%04x %s [[%s]]\n", $counter, $type, $value ) );
 $counter++;
@@ -497,7 +508,7 @@
 } elsif ($type eq 'n') {
   $const_table->add(Parrot::PackFile::Constant->new_number($value));
 } elsif ($type eq 's') {
-  $const_table->add(Parrot::PackFile::Constant->new_string(0, 0, 0, 
length($value), $value));
+  $const_table->add(Parrot::PackFile::Constant->new_string(0, $encoding, 0, 
+length($value), $value));
 } else { 
   die; # TODO: Be more specific
 }
@@ -651,7 +662,7 @@
 
 sub replace_string_constants {
   my $code = shift;
-  $code =~ s/\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($1)/eg;
+  $code =~ 
+s/(U(?:8|16|32))?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg;
   return $code;
 }
 
@@ -1283,14 +1294,17 @@
 
 sub constantize_string {
 my $s = shift;
+my $p = shift || "";
+my $e = $encodings{$p};
 # handle \ characters in the constant
 my %escape = ('a'=>"\a",'n'=>"\n",'r'=>"\r",'t'=>"\t",'\\'=>'\\',);
 $s=~s/\\([anrt\\])/$escape{$1}/g;
-if(!exists($constants{$s}{s})) {
-   push(@constants, ['s', $s]);
-   $constants{$s}{s}=$#constants;
+$s=~s/\\x([0-9a-fA-F]{1,2})/chr(hex($1))/ge;
+if(!exists($constants{$s}{s}{$e})) {
+   push(@constants, ['s', $s, $e]);
+   $constants{$s}{s}{$e}=$#constants;
 }
-return "[sc:".$constants{$s}{s}."]";
+return "[sc:".$constants{$s}{s}{$e}."]";
 }

RE: Transcoding patch

2001-10-09 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 07:03 PM 10/8/2001 -0500, Gibbs Tanton - tgibbs wrote:
> >This looks good.
> >
> >Also, WRT the utf8_t, utf16_t, and utf32_t can we not just use utf32_t and
> >then mask off the lower 8 or 16 bits?  We can still have utf8_t be defined
> >as char to allow sizeof to work right and we can do sizeof(utf8_t)*2 to get
> >the utf16_t's size.
>
> utf8 and utf16 are both variable length encodings for space reasons.
> There's not much reason to space-compact something then expand the heck out
> of it.

I think he was just referring to the internal type used to hold a
character during processing, not to expanding the whole string.

> On the other hand, I'd really, *really* rather not have Unicode
> constants in anything other than UTF-32, so I'd as soon we chopped out the
> utf-8 and utf-16 constant support from this.
>
> A should be the prefix for US-ASCII characters.
> U should be the prefix for Unicode characters
> N should be the prefix for the native character set (and the default)
>
> Beyond that I'm not sure what, if anything, we should accommodate in the
> assembler.

What does US-ASCII correspond to internally - we don't have an
encoding for that. unless you're planning to mark it as UTF-8 and
rely on US-ASCII being a subset of UTF-8 of course ;-)

The only oter thing is that writing tests for UTF-8 and UTF-16 strings
and the transcoder is going to be quite tricky if we can't generate
them using the assembler.

Other than that I'll sort out a patch for this later today.

Moving on, my next target is to get string comparison working. That's
not too difficult until you have to compare strings whose encodings
are different - comparing two unicode strings is OK as we can always
transcode the second to the same type as the first, but if we're
comparing a native string with a unicode string we will have to do
a transcode from native to unicode even if the native string is
first, so the transcoding will have to be done at the string layer
rather than the strnative/strutfn layers I think.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

RE: Transcoding patch

2001-10-09 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> utf8 and utf16 are both variable length encodings for space reasons.
> There's not much reason to space-compact something then expand the heck out
> of it. On the other hand, I'd really, *really* rather not have Unicode
> constants in anything other than UTF-32, so I'd as soon we chopped out the
> utf-8 and utf-16 constant support from this.
>
> A should be the prefix for US-ASCII characters.
> U should be the prefix for Unicode characters
> N should be the prefix for the native character set (and the default)
>
> Beyond that I'm not sure what, if anything, we should accommodate in the
> assembler.

Attached is a patch to drop the U8, U16 and U32 prefixes and
add U and N prefixes.

I havn't added the A prefix because I'm still not clear what
encoding those are supposed to map to. I can understand the
following mappings:

  N => enc_native
  U => enc_utf32

but what is A supposed to map to exactly? or is the assembler
supposed to mangle an A string into an N or U string and then
put it in the bytecode in one of those formats?

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: Assembler.pm
===
RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v
retrieving revision 1.8
diff -u -w -r1.8 Assembler.pm
--- Assembler.pm2001/10/09 02:45:36 1.8
+++ Assembler.pm2001/10/09 21:25:28
@@ -279,7 +279,7 @@
 
 =cut
 
-my %encodings=('' => 0, 'U8' => 1, 'U16' => 2, 'U32' => 3);
+my %encodings=('' => 0, 'N' => 0, 'U' => 3);
 
 my %opcodes = Parrot::Opcode::read_ops( -f "../opcode_table" ? "../opcode_table" : 
"opcode_table" );
 
@@ -662,7 +662,7 @@
 
 sub replace_string_constants {
   my $code = shift;
-  $code =~ 
s/(U(?:8|16|32))?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg;
+  $code =~ s/([NU])?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg;
   return $code;
 }

String comparison ops

2001-10-09 Thread Tom Hughes


Attached is a patch to add string comparison ops, along with the
necessary infrastructure in the string code.

The current behaviour is that if the two strings do not have the
same encoding then both are promoted to UTF-32 before comparison
as that should generally preserve information.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


? t/tom.pasm
Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.36
diff -u -w -r1.36 basic_opcodes.ops
--- basic_opcodes.ops   2001/10/08 14:04:20 1.36
+++ basic_opcodes.ops   2001/10/09 23:46:56
@@ -604,6 +604,90 @@
 AUTO_OP concat_s {
 STRING *s = string_concat(STR_REG(P1), STR_REG(P2), 1);
 STR_REG(P1) = s;
+}
+
+/* EQ Sx, Sy, EQ_BRANCH */
+MANUAL_OP eq_s_ic {
+  if (string_compare(STR_REG(P1), STR_REG(P2)) == 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* EQ Sx, CONSTANT, EQ_BRANCH */
+MANUAL_OP eq_sc_ic {
+  if (string_compare(STR_REG(P1), STR_CONST(P2)) == 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* NE Sx, Sy, NE_BRANCH */
+MANUAL_OP ne_s_ic {
+  if (string_compare(STR_REG(P1), STR_REG(P2)) != 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* NE Sx, CONSTANT, NE_BRANCH */
+MANUAL_OP ne_sc_ic {
+  if (string_compare(STR_REG(P1), STR_CONST(P2)) != 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* LT Sx, Sy, LT_BRANCH */
+MANUAL_OP lt_s_ic {
+  if (string_compare(STR_REG(P1), STR_REG(P2)) < 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* LT Sx, CONSTANT, LT_BRANCH */
+MANUAL_OP lt_sc_ic {
+  if (string_compare(STR_REG(P1), STR_CONST(P2)) < 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* LE Sx, Sy, LE_BRANCH */
+MANUAL_OP le_s_ic {
+  if (string_compare(STR_REG(P1), STR_REG(P2)) <= 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* LE Sx, CONSTANT, LE_BRANCH */
+MANUAL_OP le_sc_ic {
+  if (string_compare(STR_REG(P1), STR_CONST(P2)) <= 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* GT Sx, Sy, GT_BRANCH */
+MANUAL_OP gt_s_ic {
+  if (string_compare(STR_REG(P1), STR_REG(P2)) > 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* GT Sx, CONSTANT, GT_BRANCH */
+MANUAL_OP gt_sc_ic {
+  if (string_compare(STR_REG(P1), STR_CONST(P2)) > 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* GE Sx, Sy, GE_BRANCH */
+MANUAL_OP ge_s_ic {
+  if (string_compare(STR_REG(P1), STR_REG(P2)) >= 0) {
+RETURN(INT_CONST(P3));
+  }
+}
+
+/* GE Sx, CONSTANT, GE_BRANCH */
+MANUAL_OP ge_sc_ic {
+  if (string_compare(STR_REG(P1), STR_CONST(P2)) >= 0) {
+RETURN(INT_CONST(P3));
+  }
 }

 /* NOOP */
Index: opcode_table
===
RCS file: /home/perlcvs/parrot/opcode_table,v
retrieving revision 1.24
diff -u -w -r1.24 opcode_table
--- opcode_table2001/10/08 13:45:21 1.24
+++ opcode_table2001/10/09 23:46:57
@@ -67,7 +67,7 @@
 substr_s_s_i   4   S S I I
 concat_s   2   S S

-# Comparators (TODO: String comparators)
+# Comparators

 eq_i_ic3   I I D
 eq_ic_ic   3   I i D
@@ -94,6 +94,19 @@
 gt_nc_ic   3   N n D
 ge_n_ic3   N N D
 ge_nc_ic   3   N n D
+
+eq_s_ic3   S S D
+eq_sc_ic   3   S s D
+ne_s_ic3   S S D
+ne_sc_ic   3   S s D
+lt_s_ic3   S S D
+lt_sc_ic   3   S s D
+le_s_ic3   S S D
+le_sc_ic   3   S s D
+gt_s_ic3   S S D
+gt_sc_ic   3   S s D
+ge_s_ic3   S S D
+ge_sc_ic   3   S s D

 # Flow control

Index: string.c
===
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.12
diff -u -w -r1.12 string.c
--- string.c2001/10/08 07:49:10 1.12
+++ string.c2001/10/09 23:46:57
@@ -152,6 +152,23 @@
 return (ENC_VTABLE(s)->chopn)(s, n);
 }

+/*=for api string string_compare
+ * compare two strings.
+ */
+INTVAL
+string_compare(STRING* s1, STRING* s2) {
+if (s1->encoding != s2->encoding) {
+if (s1->encoding->which != enc_utf32) {
+s1 = Parrot_transcode_table[s1->encoding->which][enc_utf32](s1, NULL);
+}
+if (s2->encoding->which != enc_utf32) {
+s2 = Parrot_transcode_table[s2->encoding->which][enc_utf32](s2, NULL);
+}
+}
+
+return (ENC_VTABLE(s1)->compare)(s1, s2);
+}
+
 /*
  * Local variables:
  * c-indentation-style: bsd
Index: strnative.c
===
RCS file: /home/perlcvs/parrot/strnative.c,v
retrieving revision 1.15
diff -u -w -r1.15 strnative.c
--- strnative.c 2001/10/08 07:49:10 1.15
+++ strnative.c 2001/10/09 23:46:58
@@ -82,6 +82,25 @@
 return dest;
 }

+/*=for api string_native string_native_compare
+   compare two strings
+*/
+static INTVAL
+string_native_compare(STRING* s1, STRING* s2) {
+INTVAL cmp;
+
+if (s1->bufused < s

Re: String comparison ops

2001-10-10 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 10, 2001 at 12:49:50AM +0100, Tom Hughes wrote:
> > Attached is a patch to add string comparison ops, along with the
> > necessary infrastructure in the string code.
>
> I see no tests *or* documentation. Come on, Tom, you should know
> better than that. :)

Tests are next on my list... One reason for writing the comparison
stuff was to make writing tests for the transcoder etc possible.

I'll sort out a documentation patch in a momemnt.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

RE: String comparison ops

2001-10-10 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote:

> Does the call to the transcode function create a new string or change the
> string in place.  I don't think we want to pass in a native string only to
> find out it is unicode after we get done comparing it.

It creates a new string if the second argument is null, and overwrites
the second argument otherwise, so in this case it will create a new string.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: String comparison ops

2001-10-10 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
Simon Cozens <[EMAIL PROTECTED]> wrote:

> I see no tests *or* documentation. Come on, Tom, you should know
> better than that. :)

Here's the doc patch:

Index: strings.pod
===
RCS file: /home/perlcvs/parrot/docs/strings.pod,v
retrieving revision 1.4
diff -u -w -r1.4 strings.pod
--- strings.pod 2001/10/02 14:01:31 1.4
+++ strings.pod 2001/10/10 07:55:40
@@ -89,6 +89,17 @@
 C<*dest> is a null pointer, a new string structure is created with the
 same encoding as C.)
 
+To compare two strings, use:
+
+INTVAL string_compare(STRING* s1, STRING* s2)
+
+The value returned will be less than, equal to, or greater than zero
+depending on whether C is less than, equal to, or greater than C.
+
+Strings whose encodings are not the same can be compared - in this
+case a UTF-32 copy will be made of each string and these copies will
+be compared.
+
 B: 
 To format output into a string, use
 
Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: String comparison ops

2001-10-10 Thread Tom Hughes

In message <00b001c15166$a3b88ee0$7f03ef12@MLAMBERT>
Michel Lambert <[EMAIL PROTECTED]> wrote:

> Am I missing something here, or does this code not properly free transcoded
> s1's and s2's after it's done comparing them?

You're quite right that it doesn't, but neither does anything else
that creates temporary strings in a different encoding ;-)

As we're using garbage collection we shouldn't need to do an explicit
free though surely - in fact I'm not quite sure why string_destroy
even exists...

It's easy enough to add some frees if they are needed though.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: String comparison ops

2001-10-10 Thread Tom Hughes


Index: string.t
===
RCS file: /home/perlcvs/parrot/t/op/string.t,v
retrieving revision 1.8
diff -u -w -r1.8 string.t
--- string.t	2001/10/05 11:46:47	1.8
+++ string.t	2001/10/10 08:42:55
@@ -1,6 +1,6 @@
 #! perl -w
 
-use Parrot::Test tests => 11;
+use Parrot::Test tests => 23;
 
 output_is( <<'CODE', <

Re: String comparison ops

2001-10-10 Thread Tom Hughes

In message <001d01c1516a$98c07ee0$7f03ef12@MLAMBERT>
Michel Lambert <[EMAIL PROTECTED]> wrote:

> > You're quite right that it doesn't, but neither does anything else
> > that creates temporary strings in a different encoding ;-)
> 
> In my day-or-two-old parrot copy, the only other code that uses the
> transcoding table only uses it with the second param != null (ie, save into
> existing string).

That's true, but if you look they've only just allocated the string
on the previous line... Which is actually silly but still.

Thinking about it though, that is my code as well so it doesn't really
prove anything very much ;-)

So the question is, are strings subject to GC or not? If they aren't
then I'll knock up a patch to add the string_destroy calls.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: String comparison ops

2001-10-10 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 10, 2001 at 12:49:50AM +0100, Tom Hughes wrote:
> > Attached is a patch to add string comparison ops, along with the
> > necessary infrastructure in the string code.
>
> I see no tests *or* documentation. Come on, Tom, you should know
> better than that. :)

I have just committed the string comparison changes, along with the
related doc and test patches that I posted earlier.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] strnative.c typo

2001-10-11 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Bryan C. Warnock <[EMAIL PROTECTED]> wrote:

> Assignment, not comparison.  (Plus formatted for coding standards)

Committed. The tests should really have caught this, so I'm going to
do some work on them to make them more comprehensive...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] strnative.c typo

2001-10-11 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
      Tom Hughes <[EMAIL PROTECTED]> wrote:

> In message <[EMAIL PROTECTED]>
>   Bryan C. Warnock <[EMAIL PROTECTED]> wrote:
>
> > Assignment, not comparison.  (Plus formatted for coding standards)
>
> Committed. The tests should really have caught this, so I'm going to
> do some work on them to make them more comprehensive...

Attached is a patch to string.t to extend the testing of the
comparison ops - there is now a list of pairs of strings and
each of the twelve comparison ops is tried with each pair of
strings from the list.

I'll commit this tomorrow unless somebody spots a problem.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: string.t
===
RCS file: /home/perlcvs/parrot/t/op/string.t,v
retrieving revision 1.9
diff -u -w -r1.9 string.t
--- string.t2001/10/10 18:21:05 1.9
+++ string.t2001/10/11 23:07:03
@@ -150,320 +150,150 @@
 done
 OUTPUT
 
+my @strings = (
+  "hello", "hello",
+  "hello", "world",
+  "world", "hello",
+  "hello", "hellooo",
+  "hellooo", "hello",
+  "hello", "hella",
+  "hella", "hello",
+  "hella", "hellooo",
+  "hellooo", "hella",
+  "hElLo", "HeLlO",
+  "hElLo", "hElLo"
+);
+
 output_is(<

Re: Simple sub support's now in!

2001-10-12 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> I see we don't have push-with-copy ops for the various register files. I
> think I'll go fix that.

Bryan Warnock posted a patch to add those on Monday but it doesn't
seem to have been committed...

The message is <[EMAIL PROTECTED]>.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Hmmm.

2001-10-13 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> opcheck.pl: Found 39 errors.
>
> Is opcheck.pl wrong, or is the optable wrong? Would like a volunteer
> to fix up which it is.

I''ll take a look...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Hmmm.

2001-10-13 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> opcheck.pl: Found 39 errors.
>
> Is opcheck.pl wrong, or is the optable wrong? Would like a volunteer
> to fix up which it is.

Well as far as I can tell the rules it enforces are essentially
arbitrary and not documented anywhere other than at the top of
the script itself so it is hard to be sure which is wrong.

That said the attached patch fixes up the opcode names to match
the rules enforced by opcheck.pl, and fixes a small number of tests
which were using the old names.

It also fixes a 'use of undefined value' warning from opcheck.pl
when no errors are found, and makes process_opfunc.pl abort with
an error if opcode_table and basic_ops.ops don't match.

Finally it tidies up the comments in basic_ops.ops so that they
are all in the same form.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: basic_opcodes.ops
===
RCS file: /home/perlcvs/parrot/basic_opcodes.ops,v
retrieving revision 1.38
diff -u -w -r1.38 basic_opcodes.ops
--- basic_opcodes.ops   2001/10/12 19:56:35 1.38
+++ basic_opcodes.ops   2001/10/13 10:55:28
@@ -13,7 +13,7 @@
   INT_REG(P1) = INT_CONST(P2);
 }
   
-/* SET Ix, Ix */
+/* SET Ix, Iy */
 AUTO_OP set_i {
   INT_REG(P1) = INT_REG(P2);
 }
@@ -139,7 +139,7 @@
 }
 
 /* EQ Ix, CONSTANT, EQ_BRANCH */
-MANUAL_OP eq_ic_ic {
+MANUAL_OP eq_i_ic_ic {
   if (INT_REG(P1) == INT_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -153,7 +153,7 @@
 }
 
 /* NE Ix, CONSTANT, NE_BRANCH */
-MANUAL_OP ne_ic_ic {
+MANUAL_OP ne_i_ic_ic {
   if (INT_REG(P1) != INT_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -167,7 +167,7 @@
 }
 
 /* LT Ix, CONSTANT, LT_BRANCH */
-MANUAL_OP lt_ic_ic {
+MANUAL_OP lt_i_ic_ic {
   if (INT_REG(P1) < INT_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -181,7 +181,7 @@
 }
 
 /* LE Ix, CONSTANT, LE_BRANCH */
-MANUAL_OP le_ic_ic {
+MANUAL_OP le_i_ic_ic {
   if (INT_REG(P1) <= INT_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -195,7 +195,7 @@
 }
 
 /* GT Ix, CONSTANT, GT_BRANCH */
-MANUAL_OP gt_ic_ic {
+MANUAL_OP gt_i_ic_ic {
   if (INT_REG(P1) > INT_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -209,13 +209,13 @@
 }
 
 /* GE Ix, CONSTANT, GE_BRANCH */
-MANUAL_OP ge_ic_ic {
+MANUAL_OP ge_i_ic_ic {
   if (INT_REG(P1) >= INT_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
 }
 
-/* IF IXx, TRUE_BRANCH */
+/* IF Ix, TRUE_BRANCH */
 MANUAL_OP if_i_ic {
   if (INT_REG(P1)) {
 RETURN(INT_CONST(P2));
@@ -232,7 +232,7 @@
   printf("%li", (long) INT_REG(P1));
 }
 
-/* PRINT ic */
+/* PRINT CONSTANT */
 AUTO_OP print_ic {
   printf("%li", (long) INT_CONST(P1));
 }
@@ -253,7 +253,7 @@
   INT_REG(P1)++;
 }
 
-/* INC Ix, nnn */
+/* INC Ix, CONSTANT */
 AUTO_OP inc_i_ic {
   INT_REG(P1) += INT_CONST(P2);
 }
@@ -263,7 +263,7 @@
   INT_REG(P1)--;
 }
 
-/* DEC Ix, nnn */
+/* DEC Ix, CONSTANT */
 AUTO_OP dec_i_ic {
   INT_REG(P1) -= INT_CONST(P2);
 }
@@ -278,7 +278,7 @@
   NUM_REG(P1) = NUM_CONST(P2);
 } 
 
-/* SET Nx, Nx */
+/* SET Nx, Ny */
 AUTO_OP set_n {
   NUM_REG(P1) = NUM_REG(P2);
 }
@@ -289,19 +289,19 @@
NUM_REG(P3);
 }
 
-/* SUB Nx, Ny, Iz   */
+/* SUB Nx, Ny, Nz   */
 AUTO_OP sub_n {
   NUM_REG(P1) = NUM_REG(P2) -
NUM_REG(P3);
 }
 
-/* MUL Nx, Ny, Iz   */
+/* MUL Nx, Ny, Nz   */
 AUTO_OP mul_n {
   NUM_REG(P1) = NUM_REG(P2) *
NUM_REG(P3);
 }
 
-/* DIV Nx, Ny, Iz   */
+/* DIV Nx, Ny, Nz   */
 AUTO_OP div_n {
   NUM_REG(P1) = NUM_REG(P2) /
NUM_REG(P3);
@@ -373,7 +373,7 @@
 }
 
 /* EQ Nx, CONSTANT, EQ_BRANCH */
-MANUAL_OP eq_nc_ic {
+MANUAL_OP eq_n_nc_ic {
   if (NUM_REG(P1) == NUM_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -387,7 +387,7 @@
 }
 
 /* NE Nx, CONSTANT, NE_BRANCH */
-MANUAL_OP ne_nc_ic {
+MANUAL_OP ne_n_nc_ic {
   if (NUM_REG(P1) != NUM_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -401,7 +401,7 @@
 }
 
 /* LT Nx, CONSTANT, LT_BRANCH */
-MANUAL_OP lt_nc_ic {
+MANUAL_OP lt_n_nc_ic {
   if (NUM_REG(P1) < NUM_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -415,7 +415,7 @@
 }
 
 /* LE Nx, CONSTANT, LE_BRANCH */
-MANUAL_OP le_nc_ic {
+MANUAL_OP le_n_nc_ic {
   if (NUM_REG(P1) <= NUM_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -429,7 +429,7 @@
 }
 
 /* GT Nx, CONSTANT, GT_BRANCH */
-MANUAL_OP gt_nc_ic {
+MANUAL_OP gt_n_nc_ic {
   if (NUM_REG(P1) > NUM_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -443,7 +443,7 @@
 }
 
 /* GE Nx, CONSTANT, GE_BRANCH */
-MANUAL_OP ge_nc_ic {
+MANUAL_OP ge_n_nc_ic {
   if (NUM_REG(P1) >= NUM_CONST(P2)) {
 RETURN(INT_CONST(P3));
   }
@@ -466,7 +466,7 @@
   printf("%f", NUM_REG(P1));
 }
  
-/* PRINT nc */
+/* PRINT CONSTANT */
 AUTO_OP print_nc {
   printf("%f", NUM_CONST(P1));
 }
@@ -476,7 +476,7 @@
   NUM_REG(P1) += 1;
 }
 
-/* INC Nx,

Re: [HELP NEEDED] moby.patch platform reports

2001-10-13 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  "Gregor N. Purdy" <[EMAIL PROTECTED]> wrote:

> It looks like moby.patch is going to go in, but I *really* need help
> from people on various platforms looking at the floating point
> problems. I'm hoping that someone else's compiler will complain about
> whatever it is I've done that flakes it out. Barring that, I'm hoping
> that among a group of folks checking it out, one of you will send me
> a "what were you thinking here?" message that helps me find and fix
> the problem.

I think I've solved it. You're going to kick yourself...

The answer is that you're not include math.h in core_ops.c which
means that floor() is not prototyped which means the compiler assumes
it returns an int hence the screwed up results.

You must always prototype any function that returns a double.

With an include math.h added floortest.pasm is now OK and trans.t
almost passes - it is certainly better than before. There's still
one failure in each of number.t and trans.t though although that
might just be my rather mangled checkout.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] Build system tweaks.

2001-10-15 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Andy Dougherty <[EMAIL PROTECTED]> wrote:

> ops2c and ops2pm need to make sure the directory for the output file
> exists before trying to create any files in that directory.

Well actually cvs should have created the directories when you
updated, so long as you gave it the -d switch.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] Build system tweaks.

2001-10-15 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Andy Dougherty <[EMAIL PROTECTED]> wrote:

> Yes, but they are empty, and there are no relevant entries in MANIFEST.
> Thus, if you try to make a copy of the parrot source tree in another
> directory based on the contents of the MANIFEST file, you'll get a
> copy without those empty directories, and the build will fail.

Actually they contain .cvsignore files, but those aren't in the
manifest. Plus you probably don't want them if you're making a copy
based on the manifest that therefore doesn't include the CVS control
files.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] "missing" opcodes

2001-10-16 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> Commit it for now, but I'd really, really love it if we could
> automate this sort of thing.

I have a plan to semi-automate it which I nearly implemted the other
day but didn't get around to. Basically the idea is to extend things
so an ops file can contain this:

AUTO_OP add(i, i, i|ic) {
  $1 = $2 + $3;
}

and the opcode reading module would expand the i|ic to create two
separate versions of the op. Obviously if two arguments had variants
you would get four versions and so on.

If people think that's a good solution to the problem then I'll have
a go at working up a patch.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] "missing" opcodes

2001-10-16 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 05:21 PM 10/16/2001 +0100, Tom Hughes wrote:

> >I have a plan to semi-automate it which I nearly implemted the other
> >day but didn't get around to. Basically the idea is to extend things
> >so an ops file can contain this:
>
> It sounds interesting, certainly. Give it a go and we'll see how it looks.
> (As long as it doesn't interfere with generating the switch statement or
> function table the oploop needs...)

As Gregor said, the expansion code is in the OpsFile module so
anything which uses that to read the .ops file will never know
anything about it.

I have knocked up a first pass at a patch, which is attached for
comments.

I did discover one limitation of my scheme when I started using
it to eliminate redundancy, namely cases like this:

  sub(i, i, i)
  sub(i, i, ic)
  sub(i, ic, i)

If I rewrite that using my scheme as:

  sub(i, i|ic, i|ic)

Then we wind up with a fourth variant that subtracts one constant
from another. I am wondering whether I should add an extra rule
that says that any expansion where there are more than two arguments
and all bar the first are constants is ignored, which would allow
the above and a number of other cases to be rewritten.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


? t/test.pbc
? t/test1.c
? t/test1
Index: core.ops
===
RCS file: /home/perlcvs/parrot/core.ops,v
retrieving revision 1.10
diff -u -w -r1.10 core.ops
--- core.ops2001/10/16 18:35:04 1.10
+++ core.ops2001/10/16 23:35:32
@@ -136,30 +136,18 @@
 =cut
 
   
-AUTO_OP set(i, i) {
+AUTO_OP set(i, i|ic) {
   $1 = $2;
 }
 
-AUTO_OP set(i, ic) {
+AUTO_OP set(n, n|nc) {
   $1 = $2;
 }
 
-AUTO_OP set(n, nc) {
-  $1 = $2;
-}
-
-AUTO_OP set(n, n) {
-  $1 = $2;
-}
-
-AUTO_OP set(s, sc) {
+AUTO_OP set(s, s|sc) {
   $1 = string_copy($2);
 }
 
-AUTO_OP set(s, s) {
-  $1 = string_copy($2);
-}
-
 =back
 
 =cut
@@ -239,38 +227,20 @@
 Branch if $1 is equal to $2.
 
 =cut
-
-AUTO_OP eq(i, i, ic) {
-  if ($1 == $2) {
-RETREL($3);
-  }
-}
-
-AUTO_OP eq(i, ic, ic) {
-  if ($1 == $2) {
-RETREL($3);
-  }
-}
 
-AUTO_OP eq(n, n, ic) {
+AUTO_OP eq(i, i|ic, ic) {
   if ($1 == $2) {
 RETREL($3);
   }
 }
 
-AUTO_OP eq(n, nc, ic) {
+AUTO_OP eq(n, n|nc, ic) {
   if ($1 == $2) {
 RETREL($3);
   }
 }
-
-AUTO_OP eq(s, s, ic) {
-  if (string_compare($1, $2) == 0) {
-RETREL($3);
-  }
-}
 
-AUTO_OP eq(s, sc, ic) {
+AUTO_OP eq(s, s|sc, ic) {
   if (string_compare($1, $2) == 0) {
 RETREL($3);
   }
@@ -295,37 +265,19 @@
 
 =cut
 
-AUTO_OP ne(i, i, ic) {
+AUTO_OP ne(i, i|ic, ic) {
   if ($1 != $2) {
 RETREL($3);
   }
 }
 
-AUTO_OP ne(i, ic, ic) {
+AUTO_OP ne(n, n|nc, ic) {
   if ($1 != $2) {
 RETREL($3);
   }
 }
 
-AUTO_OP ne(n, n, ic) {
-  if ($1 != $2) {
-RETREL($3);
-  }
-}
-
-AUTO_OP ne(n, nc, ic) {
-  if ($1 != $2) {
-RETREL($3);
-  }
-}
-
-AUTO_OP ne(s, s, ic) {
-  if (string_compare($1, $2) != 0) {
-RETREL($3);
-  }
-}
-
-AUTO_OP ne(s, sc, ic) {
+AUTO_OP ne(s, s|sc, ic) {
   if (string_compare($1, $2) != 0) {
 RETREL($3);
   }
@@ -349,38 +301,20 @@
 Branch if $1 is less than $2.
 
 =cut
-
-AUTO_OP lt(i, i, ic) {
-  if ($1 < $2) {
-RETREL($3);
-  }
-}
-
-AUTO_OP lt(i, ic, ic) {
-  if ($1 < $2) {
-RETREL($3);
-  }
-}
 
-AUTO_OP lt(n, n, ic) {
+AUTO_OP lt(i, i|ic, ic) {
   if ($1 < $2) {
 RETREL($3);
   }
 }
 
-AUTO_OP lt(n, nc, ic) {
+AUTO_OP lt(n, n|nc, ic) {
   if ($1 < $2) {
 RETREL($3);
   }
 }
-
-AUTO_OP lt(s, s, ic) {
-  if (string_compare($1, $2) < 0) {
-RETREL($3);
-  }
-}
 
-AUTO_OP lt(s, sc, ic) {
+AUTO_OP lt(s, s|sc, ic) {
   if (string_compare($1, $2) < 0) {
 RETREL($3);
   }
@@ -404,38 +338,20 @@
 Branch if $1 is less than or equal to $2.
 
 =cut
-
-AUTO_OP le(i, i, ic) {
-  if ($1 <= $2) {
-RETREL($3);
-  }
-}
-
-AUTO_OP le(i, ic, ic) {
-  if ($1 <= $2) {
-RETREL($3);
-  }
-}
 
-AUTO_OP le(n, n, ic) {
+AUTO_OP le(i, i|ic, ic) {
   if ($1 <= $2) {
 RETREL($3);
   }
 }
 
-AUTO_OP le(n, nc, ic) {
+AUTO_OP le(n, n|nc, ic) {
   if ($1 <= $2) {
 RETREL($3);
   }
 }
-
-AUTO_OP le(s, s, ic) {
-  if (string_compare($1, $2) <= 0) {
-RETREL($3);
-  }
-}
 
-AUTO_OP le(s, sc, ic) {
+AUTO_OP le(s, s|sc, ic) {
   if (string_compare($1, $2) <= 0) {
 RETREL($3);
   }
@@ -459,38 +375,20 @@
 Branch if $1 is greater than $2.
 
 =cut
-
-AUTO_OP gt(i, i, ic) {
-  if ($1 > $2) {
-RETREL($3);
-  }
-}
-
-AUTO_OP gt(i, ic, ic) {
-  if ($1 > $2) {
-RETREL($3);
-  }
-}
 
-AUTO_OP gt(n, n, ic) {
+AUTO_OP gt(i, i|ic, ic) {
   if ($1 > $2) {
 RETREL($3);
   }
 }
 
-AUTO_OP gt(n, nc, ic) {
+AUTO_OP gt(n, n|nc, ic) {
   if ($1 > $2) {
 RETREL($3);
   }
 }
-
-AUTO_OP gt(s, s, ic) {
-  if (string_compare($1, $2) > 0) {
-RETREL($3);
-  }
-}
 
-AUTO_OP g

Re: Missing transcoding functions?

2001-10-17 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  James Mastros <[EMAIL PROTECTED]> wrote:

> I'm working on implementing the ord(i,s) and chr(s,i) opcodes I talked about
> earlier, and I noticed what I consider a bug: there exist no transcode
> functions to or from native.

That's because we haven't worked out the necessary logistics for
that yet - it requires some means for determing what the native
character set is based on the locale which can then be used to load
an appropriate transcoding table.

> Also, the diagonals (identy transforms) don't exist.  This means that you
> have to explicitly check that you aren't transcoding from an encoding to the
> same encoding.

That is as per Dan's spec. I have thought about adding a wrapper
routine to do the check you refer to.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [PATCH] "missing" opcodes

2001-10-17 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
Simon Cozens <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 17, 2001 at 12:40:36AM +0100, Tom Hughes wrote:
> > I have knocked up a first pass at a patch, which is attached for
> > comments.
> 
> I've committed this. Thanks, that should *greatly* help maintainability.

I've got an extended version now that handles the other case
that I was talking about. I had to change the rule a bit though
so that it now ignore any expansion which has more than one
expanded argument and has all the expanded arguments as constants.

With that version of the patch core.ops has 25% fewer lines in it.

I'll commit the updated version shortly unless somebody screams...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: Moving string -> number conversions to string libs

2001-12-03 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Simon Cozens <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote:
> > The string to number conversion stuff should really be done by the
> > string encodings... I think this is the right way to get this
> > happening, comments?
> 
> Looks like the right way to me. Could you commit it?

It's completely wrong I would have thought - the encoding layer
cannot know that a given code point is a digit so it can't possibly
do string to number conversion.

You need to use the encoding layer to fetch each character and
then the character set layer to determine what digit it represents.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Moving string -> number conversions to string libs

2001-12-05 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  James Mastros <[EMAIL PROTECTED]> wrote:

> On Mon, 3 Dec 2001, Tom Hughes wrote:
> > It's completely wrong I would have thought - the encoding layer
> > cannot know that a given code point is a digit so it can't possibly
> > do string to number conversion.
> >
> > You need to use the encoding layer to fetch each character and
> > then the character set layer to determine what digit it represents.
> Right.  And then you need to apply some unified logic to get from this
> vector of digits (and other such symbols) to a value.

Indeed, and that logic needs to be in the string layer where it can
use both the encoding routines and the character type routines. I have
just rearranged things to reflect that.

> I'm just having nightmares of subtily different definitions of what a
> numeric constant looks like depending on the string encoding, because of
> different bits o' code not being quite in sync.  Code duplication bad,
> code sharing good.

Absolutely. That code is now in one place.

> (The charset layer should still be involved somewhere, because Unicode
> (for ex) has a "digit value" property.  This makes, say, aribic numerials
> (which don't look at all what what a normal person calls aribic numerals,
> BTW) work properly.  (OTOH, it might also do strange things with ex
> Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also
> 2, etc.))

So far I have added as is_digit() call to the character type layer
to replace the existing isdigit() calls. To do things completely right
we need to extend that with calls to get the digit value, check for
sign characters etc, rather than assuming ASCIIish like it does now.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Moving string -> number conversions to string libs

2001-12-06 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
James Mastros <[EMAIL PROTECTED]> wrote:

> Right.  Unfornatly, after starting on this, I relized that that's the easy
> part.  Unicode has a fairly-well defined way of figuring out if a character
> is a digit (see if it's category is Nd (Number/digit), and if so what it's
> value is (the value of the "decimal" property.)

Can it also tell you the base used for digit strings in that 
character set... Actually I don't know if there are any modern
writing systems that don't use base ten but certainly if you
were dealing with some ancient scripts that used sexagesimal
numbers that might be a problem ;-)

> However, there appears to be no good way of determining if somthing is a
> decimal point, a sign indicator, or an E/e (exponent signifier).

I suspected there wouldn't be.

> The attached patch will let the chartype layer decide if a character is a
> digit, and what it's value is.  

The patch seems to be missing though...

> Note also that is_digit should now return the value of the digit if it is a
> digit, or 42 if it isn't.  (I had to use somthing, and ~0 sometimes wanted
> to be (char)~0, and sometimes (INTVAL)~0, so I decided not to use ~0.  0, of
> course, can't be used for not-a-digit, since is_digit('0')==0.

I was assuming there would a separate digit_value() routine to avoid
that problem. Apart from anything else there will doubtless me many
other is_xxx() routines in due course which will be simple boolean
tests.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: Moving string -> number conversions to string libs

2001-12-06 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Bart Lateur <[EMAIL PROTECTED]> wrote:

> On Thu, 06 Dec 2001 00:16:34 GMT, Tom Hughes wrote:
> 
> >So far I have added as is_digit() call to the character type layer
> >to replace the existing isdigit() calls.
> 
> There seems to be an overlap with the /\d/ character class in regexes.
> Can't you use the same test? Can't you use the definition of that
> character class, whatever form it may be in?

Well presumably the regex code should use the character type of
the string it is matching against when processing \d. There isn't
any regex code in yet though is there?

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Bytecode portablilty

2001-12-10 Thread Tom Hughes


In message <20011210011601$[EMAIL PROTECTED]>
  "Bryan C. Warnock" <[EMAIL PROTECTED]> wrote:

> - Endianness.  The three major types are Big, Little, and Vaxian.
> Supporting these three should handle the majority of cases.

Actually VAXes have perfectly ordinary endianness - it was PDPs that
had the middle endian layout.

> - Floating point representations.  The four major types are IEEE(ish),
> Vaxian, Cray's CRI, and the IBM/370 hexadecimal format.  There are some
> minor variations among these, particularly with how much of the
> IEEE-754 standard floating point operations adhere to.  However,
> adherence falls more into Portability Layer Three, and we will solely
> address representation.

Of course there are also about five variants of floating point
format on the VAX although only two are 64 bits in size. Some of
those exist (or are emulated) on Alpha as well although that also
has IEEE types.

> - I've code that currently converts 32, 64, 96, and 128 bit floating
> point representations among all but the IBM format (for which I have
> the algorithms on paper, but nowhere to test), optimized for both 32
> bit and 64 bit support.  Although 96 and 128 bit handling is currently
> hardcoded specifically for conversions between long doubles on x86
> machines and 64 bit processors, I've got alpha code for casting among
> arbitrary types.  (For casting to and from 32 bit floats on machines
> that have no such type, for instance.)  IEEE semantics are *not*
> supported, and are still a matter for discussion.  The implementation
> of over- and underflow conversion to BigFloat is missing, for obvious
> reasons.  I'm still trying to come up with a better interface and
> implementation, however.

Presumably that's G_Floating that you're converting to/from for
the VAX rather than D_Floating?

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Bytecode portablilty

2001-12-10 Thread Tom Hughes

In message <20011210133529.EYKY11472.femail13.sdc1.sfba.home.com@there>
Bryan C. Warnock <[EMAIL PROTECTED]> wrote:

> On Monday 10 December 2001 03:06 am, Tom Hughes wrote:
> > In message <20011210011601$[EMAIL PROTECTED]>
> >
> > Actually VAXes have perfectly ordinary endianness - it was PDPs that
> > had the middle endian layout.
> 
> Who's got the 16 bittish little endian layout ("21436587")?  (Perhaps it's 
> wrong to categorize that as endianness.)

I always believed it to be one or more of the PDP machines - most unix
systems call it PDP endian in their header files. That said the jargon
file lists the PDP 10 as big endian and the PDP 11 as little endian,
and has this to say about the third form:

  middle-endian adj.

  Not big-endian or little-endian. Used of perverse byte orders such
  as 3-4-1-2 or 2-1-4-3, occasionally found in the packed-decimal
  formats of minicomputer manufacturers who shall remain nameless.

Certainly the VAX is a perfectly ordinary little endian system.

> > Presumably that's G_Floating that you're converting to/from for
> > the VAX rather than D_Floating?
> 
> Yes.   Is that going to be a problem?  (The sum of programs I've written on 
> a VAX can be represented with 1 digit.  In base 2.)  

Well VAXC defaults to using D_Floating for doubles but can be made
to use G_Floating instead with a switch to the compiler. I'm not sure
whether that makes it a problem or not.

> I've paper code for converting to and from D_Floating (for general data 
> migration), but it's range is too restrictive for my liking for floating 
> point constants inside of bytecode.   If this is bumpkis, someone clue me 
> in, por favor.

As you say the exponent is more restricted (it has the same size as
in F_Floating which is the single precision format) but the trade off
is that the mantissa is larger so you get greater precision at the
expense of less range.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: JIT me some speed!

2001-12-20 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> To run a program with the JIT, pass test_parrot the -j flag and watch it
> scream. Well, scream if you're on x86 Linux or BSD (I get a speedup on
> mops.pbc of 35x) but it's a darned good place to start.

It does seem to be quite impressively fast. Faster even than the
compiled version of mops on my machine...

It looks like it is going to need some work before it can work for
other instruction sets though, at least for RISC systems where the
operands are typically encoded with the opcode as part of a single
word and the range of immediate constants is often restricted.

I'm thinking it will need some way of indicating field widths and
shifts for the operands and opcode so they can be merged into an
instruction word and also some way of handling a constant pool so
that arbitrary addresses can be loaded using PC relative loads.

I suspect it is also rather questionable to call system calls
directly rather than going via their C library veneers - that is
even more true when you come to things (like socket calls) which
are system calls on some machines and functions on others.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: JIT me some speed!

2001-12-21 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
Daniel Grunblatt <[EMAIL PROTECTED]> wrote:

> On Fri, 21 Dec 2001, Tom Hughes wrote:
> 
> > I suspect it is also rather questionable to call system calls
> > directly rather than going via their C library veneers - that is
> > even more true when you come to things (like socket calls) which
> > are system calls on some machines and functions on others.
> 
> We are not always calling system calls directly, we can use the C library
> when ever we need it, check out the .jit syntax.

I did have a brief look last night but I must have missed that. No
problem that front then.

Incidentally the JIT times are definitely impressive... Times for
a 1.33 GHz Athlon are like this:

dutton [~/src/parrot] % ./test_parrot ./examples/assembly/mops.pbc 
Iterations:1
Estimated ops: 2
Elapsed time:  4.806858
M op/s:41.607220

dutton [~/src/parrot] % ./test_parrot -j ./examples/assembly/mops.pbc
Iterations:1
Estimated ops: 2
Elapsed time:  0.300258
M op/s:666.093736

dutton [~/src/parrot] % ./examples/assembly/mops 
Iterations:1
Estimated ops: 2
Elapsed time:  0.324787
M op/s:615.788117

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: [PATCH] string_transcode

2002-01-02 Thread Tom Hughes

In message <007f01c1930c$9d326220$[EMAIL PROTECTED]>
  "Peter Gibbs" <[EMAIL PROTECTED]> wrote:

> Another correction to string_transcode; this function now seems to work okay
> (tested using a dummy 'encode' op added to my local copy of core.ops)

Applied, thanks.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: TODOs for STRINGs

2002-01-02 Thread Tom Hughes

In message <20020102054642$[EMAIL PROTECTED]>
  "David & Lisa Jacobs" <[EMAIL PROTECTED]> wrote:

> Here is a short list of TODOs that I came up with for STRINGs.  First, do
> these look good to people?  And second, what is the preferred method for
> keeping track of these (patch to the TODO file, entries in bug tracking
> system, mailing list, etc.
> 
> * Add set ops that are encoding aware (e.g., set S0, "something", "unicode",
> "utf-8")?

You can already have Unicode constants by prefixing the string
with a U character. I seem to recall Dan saying that he didn't want
to allow constants in arbitrary encodings but instead would prefer
just to have native and unicode.

> * Add transcoding ops (this might be a specific case of the previous e.g.,
> set S0, S1, "unicode", "utf-16")

I'm not sure whether this is needed. I think the idea is that in
general transcoding will happen at I/O time, presumably by pushing
a transcoding module on the I/O stack.

> * Move like encoded string comparison into encodings (i.e., the STRING
> comparison function gets the strings into the same encoding and then calls
> out to the encodings comparison function - This will allow each encoding to
> optimize its comparison.

The problem with this is that string comparison depends on both the
encoding and the character set so in general you can't do this. If
the character set was the same for both strings then you could do so
though.

What I did think about was having a flag on each encoding that
specified whether or not comparisons for that encoding could be
done using memcmp() when the character sets were the same. That
is true for things like the single byte encoding, but probably
not for the unicode encodings due to canonicalisation issues.

> * Add size of string termination to encodings (i.e., how many 0 bytes)

Certainly.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Proposal: Naming conventions

2002-01-10 Thread Tom Hughes

In message <20020110201559$[EMAIL PROTECTED]>
  "Melvin Smith" <[EMAIL PROTECTED]> wrote:

> >  Foo foo = (Foo) malloc(sizeof(*foo));
> >? Does ANSI allow using sizeof on a variable declared on the
> > same line?
> 
> Wouldn't sizeof(Foo) be safer here? At the logical time of the
> call *foo points to undefined. Technically its not a deref but
> still looks scary. In C++ it might be confusing if you were to
> cast it as:

Well sizeof(Foo) and sizeof(*foo) are not actually the same thing
at all there because Foo is presumably a typedef for a pointer type
so sizeof(Foo) will be the size of a pointer and sizeof(*foo) will
be the size of the thing it points to.

You're quite right that it isn't technically a deref, as sizeof() is
only interested in the static type of the object and is evaluated at
compile time (if we ignore VLA's in C99 that is).

In general it is safer to sizeof() on the variable you are working
with than on it's type, as that way the sizeof() will still work if
somebody changes the type of the variable.

> // If it were really C++ we would probably be using new()
> Foo foo = (FooBar) malloc(sizeof(*foo));
> 
> What type is *foo then? Should be Foo, but what if FooBar
> was of different size, it might not be an obvious bug to someone
> that just came along and tweaked your code.

The type of *foo is whatever Foo as been typedefed as a pointer
to, and FooBar is a red herring.

> >If people have visceral objections to typedef'ing pointers, I'm
> >fine with dropping that part of the proposal. I'd just like to see
> 
> I've always been uncomfortable with that practice, its one part of
> the whole Win32 world I hate. If you stick with the practice then
> you either end up making a new typedef for every level of indirection
> or you drop to using * (some typedef), etc. Now if it were C++ and we
> were using a smart pointer class I don't mind the practice.

I will agreee that hiding pointers inside typedefs is not a very
good idea, if only because it makes it impossible to const qualify
the pointer without creating a second parallel typedef.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [COMMIT] Embedding enhancements

2002-02-19 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Nicholas Clark <[EMAIL PROTECTED]> wrote:

> On Sat, Feb 16, 2002 at 01:46:56AM -0800, Brent Dax wrote:
> > NEW CONVENTIONS FOR DATA EXPOSED TO EMBEDDERS:
> > 
> > -All structs should have a name of the form parrot_system_t.  This name
> > should never be directly used outside the subsystem in question.
> > 
> > struct parrot_foo_t {
> > ...
> > };
> 
> Am I right in thinking that I could paraphrase that statement as
> "All structs should trample in ANSI's reserved namespace"?

I don't think so... As far as I can find in the standard, only
certain type names ending in _t are reserved, namely:

   [#1] Type names beginning with int or uint and  ending  with
   _t  may  be  added  to the types defined in the 
   header.  Macro names beginning with INT or UINT  and  ending
   with  _MAX or _MIN, or macro names beginning with PRI or SCN
   followed by any lower case letter or X may be added  to  the
   macros defined in the  header.

So struct x_t should be fine because that's a structure tag and
not a type name.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: I submit for your aproval . . .

2002-04-10 Thread Tom Hughes

In message <a05101503b8da6ead2821@[63.120.19.221]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 6:29 PM -0400 4/10/02, Roman Hunt wrote:
> 
> >also  I think
> >encoding_lookup() should accept an argument of "native".
> 
> Good point, they should. OTOH, that makes some of this interesting,
> since which characters you use for various things depend on the
> encoding and charset.

We already have string_native_type which points to the CHARTYPE structure
for the native character type and that structure includes default_encoding
which is the name of the default encoding for the native character type.

I guess string_init could also set up string_native_encoding by looking
up the name of the default encoding for the native character type.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: TODO additions

2002-04-13 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Steve Fink <[EMAIL PROTECTED]> wrote:

> +Stability
> +-
> +Purify and other memory badness detectors

One thing that may be useful here is valgrind, which can be found
at http://developer.kde.org/~sewardj/ and does Purify types things
on linux.

I just hacked the parrot test suite to run parrot under valgrind
and it has only come up with one problem in t/op/hacks1, the details
of which are as follows:

  valgrind-20020329, a memory error detector for x86 GNU/Linux.
  Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward.
  For more details, rerun with: -v

  Syscall param open(pathname) contains uninitialised or unaddressable byte(s)
 at 0x403F1892: __libc_open (__libc_open:31)
 by 0x403829C3: _IO_fopen@@GLIBC_2.1 (iofopen.c:67)
 by 0x809B287: cg_core (core.ops:138)
 by 0x80955E0: runops_fast_core (runops_cores.c:34)
 Address 0x4104051D is 3201 bytes inside a block of size 32824 alloc'd
 at 0x4003DCC2: malloc (vg_clientmalloc.c:618)
 by 0x8092E11: mem_sys_allocate (memory.c:74)
 by 0x8098DAD: Parrot_alloc_new_block (resources.c:830)
 by 0x8092EC0: mem_setup_allocator (memory.c:108)

  ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
  malloc/free: in use at exit: 249652 bytes in 54 blocks.
  malloc/free: 58 allocs, 4 frees, 381692 bytes allocated.
  For a detailed leak analysis,  rerun with: --leak-check=yes
  For counts of detected errors, rerun with: -v

I haven't attempted to look at this and see what is causing it.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: TODO additions

2002-04-13 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
      Tom Hughes <[EMAIL PROTECTED]> wrote:

>   Syscall param open(pathname) contains uninitialised or unaddressable byte(s)
>  at 0x403F1892: __libc_open (__libc_open:31)
>  by 0x403829C3: _IO_fopen@@GLIBC_2.1 (iofopen.c:67)
>  by 0x809B287: cg_core (core.ops:138)
>  by 0x80955E0: runops_fast_core (runops_cores.c:34)
>  Address 0x4104051D is 3201 bytes inside a block of size 32824 alloc'd
>  at 0x4003DCC2: malloc (vg_clientmalloc.c:618)
>  by 0x8092E11: mem_sys_allocate (memory.c:74)
>  by 0x8098DAD: Parrot_alloc_new_block (resources.c:830)
>  by 0x8092EC0: mem_setup_allocator (memory.c:108)
> 
>   ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
>   malloc/free: in use at exit: 249652 bytes in 54 blocks.
>   malloc/free: 58 allocs, 4 frees, 381692 bytes allocated.
>   For a detailed leak analysis,  rerun with: --leak-check=yes
>   For counts of detected errors, rerun with: -v
> 
> I haven't attempted to look at this and see what is causing it.

I've had a look at it now. The problem is that we are passing
s->bufstart to fopen but there is no guarantee that there is a
nul byte at the end of the buffer as parrot strings are not nul
terminated.

I have developed patch for this in the form of a new routine
which returns a nul terminated C style string given a parrot
string as argument. It does this by making sure buflen is at
least one greater than bufused and then stuffing a nul in that
byte.

This isn't a particularly brilliant fix so I'm attaching it here
for comments before I commit it.

Of course we also need to think about encoding/charset issues
when passing strings to system calls...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: core.ops
===
RCS file: /home/perlcvs/parrot/core.ops,v
retrieving revision 1.119
diff -u -w -r1.119 core.ops
--- core.ops3 Apr 2002 23:03:37 -   1.119
+++ core.ops13 Apr 2002 14:11:11 -
@@ -135,7 +135,7 @@
 =cut

 inline op open(out INT, in STR) {
-  $1 = (INTVAL)fopen(($2)->bufstart, "r+");
+  $1 = (INTVAL)fopen(string_to_cstring(interpreter, ($2)), "r+");
   if (!$1) {
 perror("Can't open");
 exit(1);
@@ -145,7 +145,7 @@
 }

 inline op open(out INT, in STR, in STR) {
-  $1 = (INTVAL)fopen(($2)->bufstart, ($3)->bufstart);
+  $1 = (INTVAL)fopen(string_to_cstring(interpreter, ($2)), 
+string_to_cstring(interpreter, ($3)));
   goto NEXT();
 }

@@ -246,7 +246,7 @@
 op print(in STR) {
   STRING *s = $1;
   if (s && string_length(s)) {
-printf("%.*s", (int)string_length(s), (char *) s->bufstart);
+printf("%s", string_to_cstring(interpreter, (s)));
   }
   goto NEXT();
 }
@@ -255,7 +255,7 @@
   PMC *p = $1;
   STRING *s = (p->vtable->get_string(interpreter, p));
   if (s) {
-printf("%.*s",(int)string_length(s),(char *) s->bufstart);
+printf("%s", string_to_cstring(interpreter, (s)));
   }
   goto NEXT();
 }
@@ -304,7 +304,7 @@
default: file = (FILE *)$1;
   }
   if (s && string_length(s)) {
-fprintf(file, "%.*s",(int)string_length(s),(char *) s->bufstart);
+fprintf(file, "%s", string_to_cstring(interpreter, (s)));
   }
   goto NEXT();
 }
@@ -323,7 +323,7 @@
default: file = (FILE *)$1;
   }
   if (s) {
-fprintf(file, "%.*s",(int)string_length(s),(char *) s->bufstart);
+fprintf(file, "%s", string_to_cstring(interpreter, (s)));
   }
   goto NEXT();
 }
Index: string.c
===
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.68
diff -u -w -r1.68 string.c
--- string.c12 Apr 2002 01:40:28 -  1.68
+++ string.c13 Apr 2002 14:11:12 -
@@ -802,6 +802,21 @@
 NULL, 0, NULL);
 }

+const char *
+string_to_cstring(struct Parrot_Interp * interpreter, STRING * s)
+{
+char *cstring;
+
+if (s->buflen == s->bufused)
+string_grow(interpreter, s, 1);
+
+cstring = s->bufstart;
+
+cstring[s->bufused] = 0;
+
+return cstring;
+}
+

 /*
  * Local variables:
Index: include/parrot/string_funcs.h
===
RCS file: /home/perlcvs/parrot/include/parrot/string_funcs.h,v
retrieving revision 1.6
diff -u -w -r1.6 string_funcs.h
--- include/parrot/string_funcs.h   22 Mar 2002 04:11:57 -  1.6
+++ include/parrot/string_funcs.h   13 Apr 2002 14:11:12 -
@@ -27,6 +27,7 @@
  const STRING *, STRING **);
 INTVAL Parrot_string_compare(Parrot, const STRING *, const STRING *);
 Parrot_Bool Parrot_string_bool(const STRING *);
+const char *Parrot_string_cstring(const S

Re: TODO additions

2002-04-13 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Roman Hunt <[EMAIL PROTECTED]> wrote:

> why dont we default to null terminating strings of type native?
> if "native" is what we get when LANG=C it only seems natural to do so.
> else we are forced to use wrapper functions a that grow and manipulate
> string data any time we need to pass it to standard C functions that
> wont accept a string_length parameter, this list unfortunately contains
> several syscalls.

Well that is what perl 5 does certainly. I thought it had been
decided not to do that in perl 6 though due to issues about what
it meant to nul terminate in various different character sets.

We can't assume that US-ASCII will be native everywhere though as
some platforms may use some form of unicode as the native character
set (and accept unicode arguments to systems calls).

It does need some thought though, to determine how best to handle
this issue.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: TODO additions

2002-04-14 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
      Tom Hughes <[EMAIL PROTECTED]> wrote:

> I have developed patch for this in the form of a new routine
> which returns a nul terminated C style string given a parrot
> string as argument. It does this by making sure buflen is at
> least one greater than bufused and then stuffing a nul in that
> byte.
> 
> This isn't a particularly brilliant fix so I'm attaching it here
> for comments before I commit it.

I haven't seen any major objections to this so I have committed
it. It will at least ensure that file opening is stable for the
upcoming release.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: transcode addition

2002-04-17 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Roman Hunt <[EMAIL PROTECTED]> wrote:

>   I'm not too sure if this is necessary but it seems logical to get things
> into charsets our compilers can handle.  Hopefully this is the correct
> approach . . . . also this should NULL terminate in the event that the
> entire buffer had not yet been filled.

This is wrong - you need to worry about the character set as well
as the encoding, and at the very least you should compare the encoding
to the default encoding for the native charset and not assume that
it will always be singlebyte.

You buffer termination code is also wrong - bufused is the end of
the string. You are null terminating the buffer not the string, and
the buffer may have extra space. Plus you have created a buffer
overrun.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: x86 linux memory leak checker (and JIT ideas)

2002-04-24 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Nicholas Clark <[EMAIL PROTECTED]> wrote:

> Jarkko mailed this URL to p5p:
> 
> http://developer.kde.org/~sewardj/
> 
> It describes a free (GPL) memory leak checker for x86 Linux
> 
> 1: This may be of use for parrot hackers

Which is why I mentioned it a week or two ago ;-)

I also ran it over the test suite and fixed the only bug that it found
at that time...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: Dynaloading

2002-06-12 Thread Tom Hughes

In message <a05111b2fb92c9ba1ac83@[63.120.19.221]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> The exported name should be the MD5 checksum of a string that
> represents the actual routine name we're looking for. This, I think,
> should be specified somewhere external to the library, in some sort
> of metadata file, I think. (Not sure, I'm waffling here. But we need
> this to be unique)

Why does it need to be unique if it's not going to be linked
against anything? If you're just finding the name with dlsym() or
equivalent then you can just use the same name in all the libraries
and it won't clash.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Stack performance issue

2002-06-30 Thread Tom Hughes


There is a performance issue in the stack code, which the attached
patch attempts to address.

The problem revolves around what happens when you are close to the
boundary between two chunks. When this happens you can find that you
are in a loop where something is pushed on the stack, causing a new
chunk to be allocated. That item is then popped causing the new chunk
to be discarded only for it to have to be allocated again on the next
iteration of the loop.

This is a well known problem with chunked stacks - it is certainly a
known issue on ARM based machines which use the chunked stack variant
of the ARM procedure call standard. The solution there is to always
keep one chunk in reserve - when you move back out of a chunk you don't
free it. Instead you wait until you move back another chunk and then
free the chunk after the one that has just emptied.

Even this can go wrong if your loop pushes more that one chunks worth
of data on the stack and then pops it again, but that is far rarer than
the general case of pushing one or two items which happens to take it
over a chunk boundary.

The attached patch implements this one behind logic, both for the
generic stack and the integer stack. If nobody has any objections
then I'll commit it tomorrow sometime.

Some figures from my test programs, running on a K6-200 linux box. The
test programs push and pop 65536 times with the first column being when
that loop doesn't cross a chunk boundary and the second being when it
does cross a chunk boundary:

  No overflow Overflow
  Integer stack, before patch  0.065505s 16.589480s
  Integer stack, after patch   0.062732s  0.068460s
  Generic stack, before patch  0.161202s  5.475367s
  Generic stack, after patch   0.166938s  0.168390s

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: rxstacks.c
===
RCS file: /cvs/public/parrot/rxstacks.c,v
retrieving revision 1.5
diff -u -r1.5 rxstacks.c
--- rxstacks.c  17 May 2002 21:38:20 -  1.5
+++ rxstacks.c  30 Jun 2002 17:42:02 -
@@ -46,13 +46,20 @@
 
 /* Register the new entry */
 if (++chunk->used == STACK_CHUNK_DEPTH) {
-/* Need to add a new chunk */
-IntStack_Chunk new_chunk = mem_allocate_aligned(sizeof(*new_chunk));
-new_chunk->used = 0;
-new_chunk->next = stack;
-new_chunk->prev = chunk;
-chunk->next = new_chunk;
-stack->prev = new_chunk;
+if (chunk->next == stack) {
+/* Need to add a new chunk */
+IntStack_Chunk new_chunk = mem_allocate_aligned(sizeof(*new_chunk));
+new_chunk->used = 0;
+new_chunk->next = stack;
+new_chunk->prev = chunk;
+chunk->next = new_chunk;
+stack->prev = new_chunk;
+}
+else {
+/* Reuse the spare chunk we kept */
+chunk = chunk->next;
+stack->prev = chunk;
+}
 }
 }
 
@@ -67,11 +74,17 @@
 /* That chunk != stack check is just to allow the empty stack case
  * to fall through to the following exception throwing code. */
 
-/* Need to pop off the last entry */
-stack->prev = chunk->prev;
-stack->prev->next = stack;
-/* Relying on GC feels dirty... */
-chunk = stack->prev;
+/* If the chunk that has just become empty is not the last chunk
+ * on the stack then we make it the last chunk - the GC will clean
+ * up any chunks that are discarded by this operation. */
+if (chunk->next != stack) {
+chunk->next = stack;
+}
+
+/* Now back to the previous chunk - we'll keep the one we have
+ * just emptied around for now in case we need it again. */
+chunk = chunk->prev;
+stack->prev = chunk;
 }
 
 /* Quick sanity check */
Index: stacks.c
===
RCS file: /cvs/public/parrot/stacks.c,v
retrieving revision 1.34
diff -u -r1.34 stacks.c
--- stacks.c25 Jun 2002 23:50:51 -  1.34
+++ stacks.c30 Jun 2002 17:42:02 -
@@ -208,22 +208,29 @@
 
 /* Do we need a new chunk? */
 if (chunk->used == STACK_CHUNK_DEPTH) {
-/* Need to add a new chunk */
-Stack_Chunk_t *new_chunk = mem_allocate_aligned(sizeof(Stack_Chunk_t));
-
-new_chunk->used = 0;
-new_chunk->next = stack_base;
-new_chunk->prev = chunk;
-chunk->next = new_chunk;
-stack_base->prev = new_chunk;
-chunk = new_chunk;
-
-/* Need to initialize this pointer before the collector sees it */
-chunk->buffer = NULL;
-chunk->buffer = new_buffer_header(interpreter);
-
-Parrot_allocate(interpreter, chunk->

Re: Stack performance issue

2002-07-02 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
Melvin Smith <[EMAIL PROTECTED]> wrote:

> You might want to modify register stacks too. I currently have a
> band-aid on it that just doesn't free stack chunks which works in
> all but the weirdest cases.

I've done that now. I also just realised that the stacks are
allocating their chunks directly from the system, which presumably
means the GC won't pick them up so they need to be freed directly.

I've done that for the register stacks, and I'll do the same for the 
other stacks unless somebody spots a flaw in my logic and points out
that the GC will catch it...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu

Re: Adding the system stack to the root set

2002-07-11 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Nicholas Clark <[EMAIL PROTECTED]> wrote:

> On Wed, Jul 10, 2002 at 06:49:06PM -0400, Dan Sugalski wrote:
> > Yes, this is an issue for systems with a chunked stack. As far as I
> > know that only applies to the various ARM OSes, and for those we'll
> > have to have some different system specific code to deal with the
> > stack. (Which is fine)
> 
> Sorry, I wasn't clear in my previous reply to your private message.
> ARM Linux doesn't use a chunked stack. It's contiguous, and (for example)
> the Bohem garbage collector does work on it. I would expect NetBSD ARM
> doesn't either. (There is a FreeBSD port to StrongARM, but its mailing
> list is very very quiet). So I don't think those two will pose undue
> problems.

As far as I know all the ARM unixes use a contiguous stack - it's
just RISC OS that uses the chunked stack I believe.

I believe you can always tell by looking at where sl points and seeing
if there is a valid chunk descriptor there and then following it's prev
pointer to get the previous chunk if there is one.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [netlabs #789] [PATCH] Squish some warnings

2002-07-13 Thread Tom Hughes


In message <20020712010920$[EMAIL PROTECTED]>
  Simon Glover (via RT) <[EMAIL PROTECTED]> wrote:

> # New Ticket Created by  Simon Glover
> # Please include the string:  [netlabs #789]
> # in the subject line of all future correspondence about this issue.
> # http://bugs6.perl.org/rt2/Ticket/Display.html?id=789 >
> 
> 
> 
>  stack_chunk is now Stack_Chunk...

Applied. Somebody update the ticket please...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [netlabs #790] [PATCH] MANIFEST update

2002-07-13 Thread Tom Hughes

In message <20020712005836$[EMAIL PROTECTED]>
  Simon Glover (via RT) <[EMAIL PROTECTED]> wrote:

> # New Ticket Created by  Simon Glover
> # Please include the string:  [netlabs #790]
> # in the subject line of all future correspondence about this issue.
> # http://bugs6.perl.org/rt2/Ticket/Display.html?id=790 >
> 
> 
> 
>  Self-explanatory.

Applied. Somebody please update the ticket...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [netlabs #788] [PATCH] Array fixes (and tests)

2002-07-13 Thread Tom Hughes


In message <20020711221132$[EMAIL PROTECTED]>
  Simon Glover (via RT) <[EMAIL PROTECTED]> wrote:

> # New Ticket Created by  Simon Glover
> # Please include the string:  [netlabs #788]
> # in the subject line of all future correspondence about this issue.
> # http://bugs6.perl.org/rt2/Ticket/Display.html?id=788 >
> 
> 
> 
>  This patch fixes a number of off-by-one errors in array.pmc, and adds a
>  few more tests.

Applied. Somebody please update the ticket...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [netlabs #757] Problem mixing labels, comments and quote-marks

2002-07-13 Thread Tom Hughes


In message <20020703012231$[EMAIL PROTECTED]>
  Simon Glover (via RT) <[EMAIL PROTECTED]> wrote:

>  This code:
> 
>   A:# prints "a"
>   print "a"
>   end
> 
>   doesn't assemble; the assembler dies with the error message:
> 
> Use of uninitialized value in hash element at assemble.pl line 844.
> Couldn't find operator '' on line 1.
> 
>   If you remove the ""s from the comment, it works fine. Likewise, if
>   you put the label, op and comment on the same line, ie:
> 
>A: print "a"   # prints "a"
>  end
> 
>   then it assembles and runs OK.

Here's a patch that will fix this. I havn't committed it because I'm
not sure why the assember wasn't dropping comments that included quotes
so I'm giving people who know more about the assembler than me a chance
to comment first...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: assemble.pl
===
RCS file: /cvs/public/parrot/assemble.pl,v
retrieving revision 1.77
diff -u -r1.77 assemble.pl
--- assemble.pl 4 Jul 2002 18:36:17 -   1.77
+++ assemble.pl 13 Jul 2002 17:30:48 -
@@ -433,7 +433,7 @@
 
   $self->{pc}++;
   return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank 
lines
-  $line=~s/#[^'"]+$//;   # Remove trailing comments
+  $line=~s/#.*$//;   # Remove trailing comments
   $line=~s/(^\s+|\s+$)//g;   # Remove leading and trailing whitespace
   #
   # Accumulate lines that only have labels until an instruction is found..

Re: [netlabs #758] [PATCH] Fixes for example programs

2002-07-13 Thread Tom Hughes


In message <20020703015823$[EMAIL PROTECTED]>
  Simon Glover (via RT) <[EMAIL PROTECTED]> wrote:

> # New Ticket Created by  Simon Glover
> # Please include the string:  [netlabs #758]
> # in the subject line of all future correspondence about this issue.
> # http://bugs6.perl.org/rt2/Ticket/Display.html?id=758 >
> 
> 
> 
>  Fixes to various of the PASM examples in light of recent changes in the
>  assembler.

Applied. Somebody please update the ticket...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [netlabs #757] Problem mixing labels, comments and quote-marks

2002-07-13 Thread Tom Hughes


In message <20020713174114$[EMAIL PROTECTED]>
  brian wheeler <[EMAIL PROTECTED]> wrote:

> On Sat, 2002-07-13 at 12:32, Tom Hughes wrote:
> > In message <20020703012231$[EMAIL PROTECTED]>
> > Here's a patch that will fix this. I havn't committed it because I'm
> > not sure why the assember wasn't dropping comments that included quotes
> > so I'm giving people who know more about the assembler than me a chance
> > to comment first...
> 
> I believe it wasn't dropping the comments with quotes as a side effect
> of not wanting to break things like:
>   print "#"
> 
> which breaks with the included patch.  I basically had the same patch
> you do, but wasn't able to figure out how to handle the above case *and*
> do the right thing with  # prints "a"

Of course... The attached patch should handle that I think...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/


Index: assemble.pl
===
RCS file: /cvs/public/parrot/assemble.pl,v
retrieving revision 1.77
diff -u -r1.77 assemble.pl
--- assemble.pl 4 Jul 2002 18:36:17 -   1.77
+++ assemble.pl 13 Jul 2002 17:49:58 -
@@ -430,10 +430,13 @@
 
 sub _annotate_contents {
   my ($self,$line) = @_;
+  my $str_re = qr(\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\" |
+  \'(?:[^\\\']*(?:\\.[^\\\']*)*)\'
+ )x;
 
   $self->{pc}++;
   return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank 
lines
-  $line=~s/#[^'"]+$//;   # Remove trailing comments
+  $line=~s/^((?:[^'"]+|$str_re)*)#.*$/$1/; # Remove trailing comments
   $line=~s/(^\s+|\s+$)//g;   # Remove leading and trailing whitespace
   #
   # Accumulate lines that only have labels until an instruction is found..

Re: Parrot_open_i_sc_sc

2002-07-13 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Bryan Logan <[EMAIL PROTECTED]> wrote:

> Here's the code I have:
> 
> open I0, "test.txt", "<"
> open I1, "testdtxt", "<"
> end
> 
> I assemble and load it into pdb and get this:
> 
> Parrot Debugger 0.0.1
> 
> (pdb) list
> 1  open_i_sc_sc I0,"test.txt<","<"
> 2  open_i_sc_sc I1,"testdtxt","<"
> 3  end

This is a bug in the debugger (and also in the opcode tracing) where
it is assuming that constant strings in the byte code are zero terminated
when they aren't, and it is therefore overrunning and printing bits of
the next string or whatever. I have just committed a fix.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

PARROT QUESTIONS: Keyed access

2002-07-14 Thread Tom Hughes


I've been trying to make sense of the current status of keyed access
at all levels, from the assembler through the ops to the vtables and
it has to be said that the harder I look the more confused I seem to
become...

It all seems to be a bit of a mess at the moment, and I'd like to have
a go at cleaning it up but first of all I need to work out how it is
all supposed to work.

It is clear that the encoding currently used by the assembler does not
match that specified by PDD 8 as the following examples show:

  Instruction PDD 8 Encoding  Actual Current Encoding

  set P1["hi"], 1234  set_p_kc_ic set_keyed_p_sc_ic
  set P1[S1], 1234set_p_r_ic  set_keyed_p_s_ic
  set P1[1], 1234 set_p_kc_ic set_keyed_integer_p_ic_ic
  set P1[I1], 1234set_p_r_ic  set_keyed_integer_p_k_ic
  set P1[S1], P2[S2]  set_p_r_p_r set_keyed_p_s_p_s
  set P1[I1], P2[S2]  set_p_kc_p_rset_keyed_keyed_integer_p_i_p_s

Obviously this is a complete nonsense. To be honest I suspect that
both encodings have problems,

The PDD 8 encoding uses kc and r (why not kc and k?) to encode the keys
regardless of their type so the op has no way of knowing what sort of
argument it is dealing with.

The currently implemented system distinguishes the operand types OK but
trys to differentiate between ops with an integer key and those with
other types of keys which all falls apart when you have a combination
of integer and non-integer keys in the same instruction.

Once we get to multi-component keys things just get even worse. If we
believe PDD 8 then the syntax should be:

  set P1[I1;I2], I3

But what is currently implemented is this:

  set P1[k;I1;I2], I3

In addition it appears that the current implementation would turn that
instrucion into this encoding:

  set_keyed_integer_p_k_k_i

Where each component of the key becomes a separate argument, thereby
requiring an infinite number of ops to cope with an infinite number of
possible key components.

There is a suggestion in PDD 8 that this should be encoded as this:

  set_p_kc_i

With the key constant actually referring to an entry in the constant
table that encodes the key.

Moving on the from the assembler I'm not sure how the recent addition
of the _keyed_int vtable methods interacts with all this - they appear
to be at odds with PDD 8 anyway which appears to want to avoid the
kind of vtable explosion that they promote.

Anyhow, that's probably enough for now... If anybody can elighten me
about how all this is supposed to work then I'll try and knock it all
into shape, starting with making sure that PDD 8 is accurate.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: PARROT QUESTIONS: Keyed access

2002-07-14 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Melvin Smith <[EMAIL PROTECTED]> wrote:

> At 03:54 PM 7/14/2002 +0100, Tom Hughes wrote:
> >I've been trying to make sense of the current status of keyed access
> >at all levels, from the assembler through the ops to the vtables and
> >it has to be said that the harder I look the more confused I seem to
> >become...
> 
> FWIW, I have a large patch from Sean O'Rourke in response to my
> request for someone to cleanup the set/set_keyed stuff. I'll commit
> it later today, it does clean it up a bit, and removes some of the
> older versions of set (3 arg). It at least reduces the noise.

I was going to some work on that request, but I reached the point
where I decided there was no point trying to do anything until it
was clear what the target was that I was trying to reach...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

RE: [PATCH] MANIFEST update

2002-07-17 Thread Tom Hughes


In message <[EMAIL PROTECTED]>
  Andy Dougherty <[EMAIL PROTECTED]> wrote:

> On Wed, 17 Jul 2002, Brent Dax wrote:
> 
> > There should be no Makefile.in's left in the source--they've been tossed
> > in favor of config/gen/makefiles.
> 
> Fair enough.  I just took what cvs handed me.  It was a fresh checkout as
> of yesterday, updated this morning.  Whoever removes those files from the
> repository ought to adjust MANIFEST accordingly.

I have removed the files and updated the MANIFEST to reflect that.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

Re: [netlabs #757] Problem mixing labels, comments and quote-marks

2002-07-18 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  "David M. Lloyd" <[EMAIL PROTECTED]> wrote:

> On Sat, 13 Jul 2002, Tom Hughes wrote:
> 
> > Of course... The attached patch should handle that I think...
> 
> This patch is breaking several Solaris 32-bit tests.  The following
> assembly (from t/pmc/perlarray1.pbc):

I've just tried that test on a Solaris 7 machine and it ran fine
and produced the correct bytecode. I can't honestly see how that
patch could cause it to generate completely the wrong op...

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

1 2 >

1 - 100 of 165 matches

Mail list logo