The first draft of the dynamic chartype loading code has been committed.
In the process, several changes have been made to the CHARTYPE struct.
The main purpose is to provide data fields in the structure, so generic
functions can be used for transcoding and digit handling.

Several ops have also been added to simplify testing:
   set_encoding S0, I0 - set encoding to specified index value
   set_chartype S0, I0 - set chartype to specified index value
    - these two can be deleted once these fields can be set via IO etc
   transcode S0, S1, I0, I1 - transcode to specified encoding/chartype

Dynamic loading of a chartype is automatic if a request is made
for a non-existent chartype, e.g.
  find_chartype I0, "8859-1" - this will load
parrot/runtime/chartypes/8859-1.TXT
  set_chartype S0, I0
The search path and extension are hard-coded for now.
The mapping file format is that used by the Unicode consortium, and
8859-1.TXT
was downloaded directly from their web site; there are lots more mapping
files
there that we can use (Dan - can you confirm that the license is okay?)

There are a lot of limitations in the mechanism so far, including:
  only singlebyte encoding
  digit mapping is assumed to be standard ascii '0' to '9'
  mapping from unicode uses full table scan
However, it should allow us to get a start on testing support for multiple
character sets in the rest of Parrot, and I wanted to get something in
for comments while I continue with further development.

All feedback welcome
-- 
Peter Gibbs
EmKel Systems


Reply via email to