utf-8.7

Chris Leick Wed, 23 Feb 2011 09:11:35 -0800

Hallo Martin,

Martin Eberhard Schauer:

18K; 60 Zeichenketten für die raren Mitleser mit freien Kapazitäten.


Und weil es die nicht gibt, antworte ich mal.



#. type: Plain text
#: source/utf-8.7:33
msgid "UTF-8 - an ASCII compatible multibyte Unicode encoding"
msgstr "UTF-8 - eine ASCII-kompatible Unicode-Kodierung"

s/Unicode-Kodierung/Multibyte-Unicode-Kodierung/

Ist ein Unterschied - siehehttp://de.wikipedia.org/wiki/Multibyte_Character_Set



# FIXME: Such strings can contain as parts
# FIXME: 31-bit code space
#. type: Plain text
#: source/utf-8.7:57
msgid ""
"The B<Unicode 3.0> character set occupies a 16-bit code space.  The most "

"obvious Unicode encoding (known as B<UCS-2>) consists of a sequence of16-""bit words. Such strings can contain as parts of many 16-bit charactersbytes ""like \\(aq\\e0\\(aq or \\(aq/\\(aq which have a special meaning infilenames ""and other C library function arguments. In addition, the majority ofUnix ""tools expects ASCII files and can't read 16-bit words as characterswithout ""major modifications. For these reasons, B<UCS-2> is not a suitableexternal ""encoding of B<Unicode> in filenames, text files, environment variables,etc. "

"The B<ISO 10646 Universal Character Set (UCS)>, a superset of Unicode, "

"occupies even a 31-bit code space and the obvious B<UCS-4> encoding forit (a "

"sequence of 32-bit words) has the same problems."
msgstr ""
"Der B<Unicode-3.0>-Zeichensatz ist durch 16-Bit-Wörter definiert. Die "

"einfachste Unicode-Kodierung (B<UCS-2>) besteht aus einer Folge von16-Bit-""Zeichen. Solche Zeichenketten können als Bestandteile viele16-Bit-Zeichen ""wie \\(aq\\e0\\(aq oder \\(aq/\\(aq enthalten, die eine besondereBedeutung z.""B. in Dateinamen oder Bibliotheksfunktionen besitzen. Außerdem arbeitendie ""meisten UNIX-Programme mit B<ASCII>-Dateien und können 16-Bit-Wörternicht "

"ohne größere Änderungen verarbeiten. Darum ist B<UCS-2> keine geeignete "
"externe Kodierung von B<Unicode> in Dateinamen, Text-Dateien, "

"Umgebungsvariablen usw. Der B<ISO 10646 Universal Character Set (UCS)>,eine "

"Erweiterung von B<Unicode>, wird sogar durch 31-Bit-Wörter definiert. Die "

"zugehörige B<UCS-4>-Kodierung (eine Folge von 32-Bit-Wörtern) leidetunter "

"denselben Problemen wie die B<UCS-2>-Kodierung."

s/Bibliotheksfunktionen/anderen Argumenten von C-Bibliotheken/


#. type: Plain text
#: source/utf-8.7:67
msgid ""

"The B<UTF-8> encoding of B<Unicode> and B<UCS> does not have theseproblems "

"and is the common way in which B<Unicode> is used on Unix-style operating "
"systems."
msgstr ""

"Die B<UTF-8>-Kodierung von B<Unicode> und B<UCS> hat diese Problemenicht. "

"Sie ist die gebräuchliche Anwendung des B<Unicode>-Zeichensatzes unter "
"unixoiden Betriebssystemen."

Vielleicht:

... und die übliche Art B<Unicode> auf unixoiden Betriebssystemen zuverwenden.


# FIXME: rewording?
#. type: Plain text
#: source/utf-8.7:83
msgid ""
"B<UCS> characters 0x00000000 to 0x0000007f (the classic B<US-ASCII> "

"characters) are encoded simply as bytes 0x00 to 0x7f (ASCIIcompatibility). ""This means that files and strings which contain only 7-bit ASCIIcharacters "

"have the same encoding under both B<ASCII> and B<UTF-8>."
msgstr ""
"Die B<UCS>-Zeichen 0x00000000 bis 0x0000007f (die klassischen B<US-ASCII>-"
"Zeichen) werden einfach als die Bytes 0x00 bis 0x7f kodiert und auf diese "

"Weise die B<ASCII>-Kompatibilität hergestellt. Dateien undZeichenketten, die "

"nur aus 7-Bit-Zeichen bestehen, haben darum unter B<ASCII> und B<UTF-8> "
"dieselbe Kodierung."

s/7-Bit-Zeichen/7-Bit-ASCII-Zeichen/

#. type: Plain text
#: source/utf-8.7:100
msgid "All possible 2^31 UCS codes can be encoded using B<UTF-8>."

msgstr "Alle möglichen 2^31 B<UCS>-Zeichen können mit B<UTF-8> kodiertwerden."


s/B<UCS>/UCS/


#. type: Plain text
#: source/utf-8.7:115
msgid ""

"The first byte of a multibyte sequence which represents a singlenon-ASCII ""B<UCS> character is always in the range 0xc0 to 0xfd and indicates howlong ""this multibyte sequence is. All further bytes in a multibyte sequenceare in "

"the range 0x80 to 0xbf.  This allows easy resynchronization and makes the "
"encoding stateless and robust against missing bytes."
msgstr ""

"Das erste Byte einer Folge mehrerer Bytes, die ein einzelnesNicht-B<ASCII>-""Zeichen darstellen, ist grundsätzlich im Bereich 0xc0 bis 0xfd undzeigt an, ""wie lang die Folge ist. Alle anderen Bytes der Folge sind im Bereich0x80 bis "

"0xbf. Dadurch wird eine einfache Resynchronisation ermöglichst, da die "

"Kodierung statusunabhängig und daher robust gegenüber fehlenden oderverloren "

"gegangenen Bytes ist."

s/Nicht-B<ASCII>-Zeichen/Nicht-ASCII-B<UCS>-Zeichen
s/Bytes/Byte/
s/Resynchronisation/Neusynchronisation/
s/ermöglichst/ermöglicht/

s/statusunabhängig/zustandslos/(http://de.wikipedia.org/wiki/Zustandslosigkeit)



#. type: Plain text
#: source/utf-8.7:125
msgid ""

"B<UTF-8> encoded B<UCS> characters may be up to six bytes long, howeverthe "

"B<Unicode> standard specifies no characters above 0x10ffff, so Unicode "
"characters can only be up to four bytes long in B<UTF-8>."
msgstr ""

"B<UTF-8>-kodierte B<UCS>-Zeichen können bis zu sechs Byte lang sein. Daaber ""die B<Unicode>-Norm keine Zeichen über 0x10FFFF spezifiziert, so dassUnicode-"

"Zeichen in B<UTF-8> nur nur bis zu vier Bytes lang sind."

können Unicode-Zeichen in B<UTF-8> nur nur bis zu vier Bytes lang sein.


#. type: Plain text
#: source/utf-8.7:128
msgid ""

"The following byte sequences are used to represent a character. Thesequence "

"to be used depends on the UCS code number of the character:"
msgstr ""

"Die folgenden Byte-Folgen werden für die Darstellung von Zeichenverwendet. "

"Die zu verwendende Folge hängt vom B<UCS>-Code des Zeichens ab:"

s/B<UCS>-Code/UCS-Code/


#. type: Plain text
#: source/utf-8.7:168
msgid ""
"The I<xxx> bit positions are filled with the bits of the character code "
"number in binary representation.  Only the shortest possible multibyte "
"sequence which can represent the code number of the character can be used."
msgstr ""
"Die I<xxx>-Bits werden durch den Code des Zeichens in Binärdarstellung "
"ersetzt. Es wird die jeweils kürzeste Folge benutzt, die den Code des "
"Zeichens darstellen kann."

s/Folge/Multibyte-Folge/


#. type: Plain text
#: source/utf-8.7:175
msgid ""
"The B<UCS> code values 0xd800\\(en0xdfff (UTF-16 surrogates) as well as "

"0xfffe and 0xffff (UCS noncharacters) should not appear in conformingB<UTF-"

"8> streams."
msgstr ""

"Die B<UCS>-Codewerte 0xd800\\(en0xdfff (UTF-16-Ersatzzeichen) sowie0xfffe ""und 0xffff (UCS noncharacters) sollten nicht in standardkonformenB<UTF-8>-"

"Streams enthalten sein."

s/UCS noncharacters/keine Zeichen in UCS/
s/Streams/Datenströme/


#. type: SS
#: source/utf-8.7:175
#, no-wrap
msgid "Example"
msgstr "Beispiele"

Beispiel


#. type: Plain text
#: source/utf-8.7:250
msgid ""

"Programmers accustomed to single-byte encodings such as B<US-ASCII> orB<ISO ""8859> have to be aware that two assumptions made so far are no longervalid ""in B<UTF-8> locales. Firstly, a single byte does not necessarilycorrespond ""any more to a single character. Secondly, since modern terminalemulators in "

"B<UTF-8> mode also support Chinese, Japanese, and Korean B<double-width "
"characters> as well as nonspacing B<combining characters>, outputting a "

"single character does not necessarily advance the cursor by oneposition as "

"it did in B<ASCII>.  Library functions such as B<mbsrtowcs>(3)  and "

"B<wcswidth>(3) should be used today to count characters and cursorpositions."

msgstr ""

"An Einzel-Byte-Kodierungen gewöhnte Programmierer müssen daran denken,dass ""zwei bislang getroffene Annahmen in B<UTF-8>-Locales nicht mehr gültigsind. "

"Erstens bedeutet ein einziges Byte nicht mehr unbedingt ein einzelnes "
"Zeichen. Zweitens, da moderne Terminal-Emulatoren im B<UTF-8>-Modus auch "
"chinesische, Japanische und koreanische B<Zeichen doppelter Breite> sowie "
"B<Kombinationszeichen> ohne horizontalen Vorschub unterstützen, setzt die "

"Ausgabe eines einzelnen Zeichens nicht unbedingt den Cursor um einePosition "

"weiter, wie es bei B<ASCII> der Fall war. Heutzutage sollten Sie "
"Bibliotheksfunktionen wie B<mbsrtowcs>(3) und B<wcswidth>(3) nutzen, um "
"Zeichen und Cursorpositionen zählen."

s/Japanische/japanische/


#. type: Plain text
#: source/utf-8.7:290
msgid ""

"The B<Unicode> and B<UCS> standards require that producers of B<UTF-8>shall ""use the shortest form possible, for example, producing a two-bytesequence "

"with first byte 0xc0 is nonconforming.  B<Unicode 3.1> has added the "

"requirement that conforming programs must not accept non-shortest formsin "

"their input.  This is for security reasons: if user input is checked for "
"possible security violations, a program might check only for the B<ASCII> "
"version of \"/../\" or \";\" or NUL and overlook that there are many non-"

"B<ASCII> ways to represent these things in a non-shortest B<UTF-8>encoding."

msgstr ""

"Die Standards B<Unicode> und B<UCS> fordern, dass Erzeuger von B<UTF-8>die "

"kürzeste mögliche Form liefern. Z. B. ist der Erzeugung einer  Zwei-Byte-"

"Sequenz mit dem ersten Byte 0xc0 nicht konform. B<Unicode 3.1> fordert,dass "

"konforme Programme in ihrer Eingabe Formen, die nicht die kürzesten sind, "
"nicht akzeptieren dürfen. Dies geschieht aus Sicherheitsgründen: Wenn "
"Benutzereingaben auf mögliche Sicherheitsverletzungen überprüft werden, "

"könnte ein Programm nur nach den B<ASCII>-Versionen von \"/../\" oder\";\" "

"oder NUL suchen und übersehen, dass es viele Möglichkeiten einer Nicht-"
"B<ASCII>-Darstellung dieser Zeichen gibt."

"in a non-shortest B<UTF-8> encoding" nicht übersetzt?

Der Rest sieht gut aus.

Gruß,
Chris


--
To UNSUBSCRIBE, email to debian-l10n-german-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d653ea4.4030...@vollbio.de

Re: [RFR] man://manpages-de/utf-8.7

Antwort per Email an