> > Firstly, if the handle isn't being read with binmode set then > perhaps the \r\n are being converted to \n (if this is Windows)? > How are you creating/initializing the socket? >
Unfortunately, with or without binmode, there's no difference to the matching (from what I can tell) Socket creation: my $TCPSocket = new IO::Socket::INET (PeerHost => "x.x.x.x", PeerPort => "5000", Proto => "tcp", Blocking => "1", #### <-- Tried with blocking (0|1) as well. ) or die "ERROR in Socket Creation : $!\n"; # Ensure we get output right away $TCPSocket->autoflush(1); binmode $TCPSocket; ### Tried with/without binmode Similarly, the character encoding of the data on the socket could > matter. You said there are character codes above 127. Does that > mean the encoding is 8-bit such as [extended] ASCII or latin1, or > do you mean the character codes are WAY above 127? Character > encoding could be another culprit if the \r and \n characters are > encoded differently in the stream than you (and Perl) expects. > Using the IO layers or the explicit Encode module you should be > able to decode the stream into a Perl string that Perl > understands properly. > >From the relevant RFCs: The terms "NUL", "TAB", "LF", "CR, and "space" refer to the octets %x00, %x09, %x0A, %x0D, and %x20, respectively (that is, the octets with those codes in US-ASCII [ANSI1986] and thus in UTF-8 [RFC3629]). The term "CRLF" or "CRLF pair" means the sequence CR immediately followed by LF (that is, %x0D.0A). A "printable US-ASCII character" is an octet in the range %x21-7E. Quoted characters refer to the octets with those codes in US-ASCII (so "." and "<" refer to %x2E and %x3C) and will always be printable US-ASCII characters; similarly, "digit" refers to the octets %x30-39. However, the data stream does contain yEnc content, which as far as I know, is 8-bit encoding. So whilst the protocol itself may use UTF-8, the data transmitted in the protocol can either be UTF-8, or 8-bit Lines *should* be terminated by CRLF (provided the 8-bit encoding doesn't mess up the detection), and the entire data stream is then terminated with a CRLF.CRLF (similar to a SMTP message for example in terms of protocol). > You can attach an IO layer to the file handle by passing an > additional argument to binmode: > > binmode $fh, ':encoding(UTF-8)'; > > Loads, and LOADS and *piles* of UTF-8 errors... utf8 "\xD826" does not map to Unicode at test.pl line 40 (#1) utf8 "\x1583F9" does not map to Unicode at test.pl line 40 (#1) etc. >From personal experience and using other (nasty) methods and components for doing what I -should- be able to do with native perl, I've learned the hard way that messing with binmode $fh, ":encoding...." generally corrupts the 8-bit (yEnc) data. Again, I am more than likely doing it incorrectly, but I'm really trying to understand how to do it correctly though :-) > Lastly, you're reading from a socket so there's no guarantee that > the buffer string is going to necessarily end at the termination > boundary. Perhaps the protocol guarantees that, but the socket > surely doesn't. You may need to look for that terminating > sequence in the middle of the buffer. > > But isn't that exactly why we set things like autoflush(1) or $|=1? After the data stream has been sent from the server (i.e. CRLF.CRLF) the server stops transmitting data and waits for the next command, so there's no chance that a second data stream may be received by the client socket, at least not until the client socket issues a new command. > Does any of that help? > > I appreciate it, truly. But no, not really :-( I can honestly say, been there, done that. I realize my problem here is the really whacky way in which the data stream is encoded (and that is completely out of my control). But there must be a adequate and proper way to handle this data. -- Regards, Chris Knipe