We've been told previously, that Perl cannot safely perform byte operations like substitution or splitting, on lines containing multiple-byte, Japanese characters. Yet in reading a bit about Ken Lunde's papers, and about Perl 5.8 I/O Layers or Encode::JP, it looks like it might be possible. I'm struggling with deciphering the "documentation" into a usable solution, however, and wonder if someone could advise on this for a couple of specific tasks. I have two problems I'd like to address. 1) Split a line containing multiple-byte chars. I believe they are Shift-JIS - I'd ascertain the specific encoding before plowing into it. For example, given an input file like this: 12|3|50 <some double-byte chars here> 12|4|50 <some double-byte chars here> 12|6|50 <some double-byte chars here> 12|9|50 <some double-byte chars here> ... Is there a way to safely split these input lines, looking for a tab char for example? If a differenet delimiter character would be better suited for splitting on (other than tab), that is probably an option. The first field - numbers+pipes - does contain pipes. -----------
2) Substitution Some of the other data I need to read, from a DB using DBI, contains hard line returns (\n) input while the user created the data in certain LOB fields. I also would like to be able to remove those, which I'd ordinarily do using $x =~ s/\n//g; ----------- For example, would using Encode::JP & specifying: open my $in, "<:encoding(shiftjis)", $infile or die; ## from Enocode perldoc ... then allow me to use the split() function? Or, would I be able to use something like the CPAN module: ShiftJIS::String ? If anyone can offer up example solutions to this - I'd appreciate it. BTW, I am using Perl 5.6 on this box, but might be able to move the process to a 5.8 installation. Thanks -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>