Using Utf8: use Utf8; If I have some text: my $text = "abcdefghiæ"; # note the æ and I convert it to a fake Unicode string (fake, because you couldn't do this with real Unicode charsets like Chinese) by sticking nulls in between letters: sub AnsiToUnicode($$) { my ($sAnsi) = @_; my $lLength = length($sAnsi); my $sUnicode = ""; for(my $i = 0; $i < $lLength; $i++) { $sUnicode .= substr $sAnsi, $i, 1; $sUnicode .= "\0"; } return $sUnicode; } my $uni_text = AnsiToUnicode($text); Then, I convert it to Utf8 ( I know, why not go straight from ansi to utf8?, it's a long story -- basically, we work in Unicode here, so I've never had to go from ansi to utf8, but for the sake of the question, let's do it this way) sub UnicodeToUtf8($$$) { my($bIsBigEndian, $sText) = @_; my $sReturn = ""; my $lLength = length($sText); for(my $i = 0; $i < $lLength; $i += 2) { my $sChar = substr($sText, $i, 2); my $lByte1; my $lByte2; if($bIsBigEndian == 0) { $lByte1 = ord(substr($sChar, 1, 1)); $lByte2 = ord(substr($sChar, 0, 1)); } else { $lByte1 = ord(substr($sChar, 0, 1)); $lByte2 = ord(substr($sChar, 1, 1)); } my $lUni = ($lByte1 * 0x100) + $lByte2; if ($lUni < 0x80) { $sReturn .= chr($lUni); } elsif ($lUni < 0x800) { $sReturn .= chr(0xc0 | $lUni >> 6); $sReturn .= chr(0x80 | $lUni & 0x3f); } elsif ($lUni < 0x10000) { $sReturn .= chr(0xe0 | $lUni >> 12); $sReturn .= chr(0x80 | $lUni >> 6 & 0x3f); $sReturn .= chr(0x80 | $lUni & 0x3f); } elsif ($lUni < 0x200000) { $sReturn .= chr(0xf0 | $lUni >> 18); $sReturn .= chr(0x80 | $lUni >> 12 & 0x3f); $sReturn .= chr(0x80 | $lUni >> 6 & 0x3f); $sReturn .= chr(0x80 | $lUni & 0x3f); } } return $sReturn; } my $utf8_text = UnicodeToUtf8(0, $uni_text); # false BigEndian parameter since I'm on Win2000 now, we finally get to the heart of the problem. print $utf8_text; # produces abcdefghiæ that is, two characters for the æ character in the string. This is due, I'm assuming, to weirdness with the Utf8 pragma. The problem is this - print length($utf8_text) . "\n"; # 11 !!!!! Anyone have any experience with this? I've checked the utf8 manpage, but they gloss over length(), including it in a list of functions that should continue to operate on characters, not bytes. In fact, this is the problem -- utf8 seems to consider æ two characters. Thanks in advance Aaron Craig Programming iSoftitler.com