Strings in Java and JavaScript are basically the same as they are arbitrary sequences of 16-bit code units, and not restricted to text with valid UTF-16 encoding. The differences are in the set of access methods, but they are both normally immutable, and both allow (but do enforce) substrings to share their backing store between distinct instances. The same applies to C/C++ "wide strings" when their code units are larger than 1 byte, but C/C++ do not make them immutable, except using dedicated classes, which will transiently allow setting their content through constructors, and C/C++ wide strings exist with several signed and unsigned code units (when Java only have unsigned 16-bit code units in their "char", and Javascript has no "char" type but only "Number" types with valid range restrictions applied when constructing String instances from code units or from codepoint values.
Javascript should soon have a new numeric type (it is provisionnaly named "BigInt", a signed 64-bit integer and will have constants sufixed by "n", and there will be no implicit promotion from/to Number but only explicit conversions by checked constructors) and new code unit types for mutable buffers (but only for the rangechecks of their write accessors, using "Number" 64-bit floating points or the newer "BigInt" 64-bit integers) There are similar designs in Perl, PHP, and most languages: Unicode support and conformance for using these types for valid text is implemented only by libraries in their standard text API or in their I/O APIs taking immutable strings or mutable buffers in parameters, or returning sharable but immutable string instances or a mutable buffer referenced on input or allocated internally, but these API's are not restricted to just valid Unicode text handling and allow using their strings with any other encoding. With immutable strings implemented as classes, the backing store is normally not directly accessible even by reference, you can just reference the class referencing internally the backing store... implemented using mutable buffers and using an internal encoding which may be different from the one exposed by the string class (possibly using compression technics for their backing store, on demand, and implicit atomization of most frequently used string values, notably the empty string and string values representing a single character with an 8-bit only code point value, or strings containing any repetition of the same code point value: these values do not need any internally allocated buffer for their backing store, so these instances are allocated very fast, and do not stress the garbage collector when they are no longer used). When Unicode text handling methods are supported by their exposed methods, the Unicode validation rules are not necessarily checked everywhere, so it is still possible to have strings or buffers containing a single unpaired surrogate value. The backing store may also allow storing code units outside the ranges used by valid UTF-16 or valid UTF-32 (the backing stores are virtualized and could be on disk and swapped on demand with reusable buffers from a pool). 2017-08-25 2:17 GMT+02:00 David Starner via Unicode <unicode@unicode.org>: > > > ---------- Forwarded message --------- > From: David Starner <prosfil...@gmail.com> > Date: Thu, Aug 24, 2017, 6:16 PM > Subject: Re: Unicode education in Schools > To: Richard Wordingham <richard.wording...@ntlworld.com> > > > > > On Thu, Aug 24, 2017, 5:26 PM Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > >> Just steer them away from UTF-16! (And vigorously prohibit the very >> concept of UCS-2). >> >> Richard. >> > > Steer them away from reinventing the wheel. If they use Java, use Java > strings. If they're using GTK, use strings compatible with GTK. If they're > writing JavaScript, use JavaScript strings. There's basically no system > without Unicode strings or that they would be better off rewriting the > wheel. > >>