Re: reading from socket

Chris Knipe Tue, 11 Aug 2015 15:17:13 -0700

>
> Firstly, if the handle isn't being read with binmode set then
> perhaps the \r\n are being converted to \n (if this is Windows)?
> How are you creating/initializing the socket?
>



Unfortunately, with or without binmode, there's no difference to the
matching (from what I can tell)

Socket creation:
my $TCPSocket = new IO::Socket::INET (PeerHost => "x.x.x.x",
                                      PeerPort => "5000",
                                      Proto    => "tcp",
                                      Blocking => "1",             ####
<-- Tried with blocking (0|1) as well.
                                     ) or die "ERROR in Socket Creation :
$!\n";

# Ensure we get output right away
$TCPSocket->autoflush(1);

binmode $TCPSocket;        ### Tried with/without binmode



Similarly, the character encoding of the data on the socket could
> matter. You said there are character codes above 127. Does that
> mean the encoding is 8-bit such as [extended] ASCII or latin1, or
> do you mean the character codes are WAY above 127? Character
> encoding could be another culprit if the \r and \n characters are
> encoded differently in the stream than you (and Perl) expects.
> Using the IO layers or the explicit Encode module you should be
> able to decode the stream into a Perl string that Perl
> understands properly.
>

>From the relevant RFCs:

   The terms "NUL", "TAB", "LF", "CR, and "space" refer to the octets
   %x00, %x09, %x0A, %x0D, and %x20, respectively (that is, the octets
   with those codes in US-ASCII [ANSI1986] and thus in UTF-8 [RFC3629]).
   The term "CRLF" or "CRLF pair" means the sequence CR immediately
   followed by LF (that is, %x0D.0A).  A "printable US-ASCII character"
   is an octet in the range %x21-7E.  Quoted characters refer to the
   octets with those codes in US-ASCII (so "." and "<" refer to %x2E and
   %x3C) and will always be printable US-ASCII characters; similarly,
   "digit" refers to the octets %x30-39.

However, the data stream does contain yEnc content, which as far as I know,
is 8-bit encoding.  So whilst the protocol itself may use UTF-8, the data
transmitted in the protocol can either be UTF-8, or 8-bit

Lines *should* be terminated by CRLF (provided the 8-bit encoding doesn't
mess up the detection), and the entire data stream is then terminated with
a CRLF.CRLF (similar to a SMTP message for example in terms of protocol).



> You can attach an IO layer to the file handle by passing an
> additional argument to binmode:
>
>     binmode $fh, ':encoding(UTF-8)';
>
>
Loads, and LOADS and *piles* of UTF-8 errors...

utf8 "\xD826" does not map to Unicode at test.pl line 40 (#1)
utf8 "\x1583F9" does not map to Unicode at test.pl line 40 (#1)
etc.

>From personal experience and using other (nasty) methods and components for
doing what I -should- be able to do with native perl, I've learned the hard
way that messing with binmode $fh, ":encoding...." generally corrupts the
8-bit (yEnc) data.  Again, I am more than likely doing it incorrectly, but
I'm really trying to understand how to do it correctly though :-)



> Lastly, you're reading from a socket so there's no guarantee that
> the buffer string is going to necessarily end at the termination
> boundary. Perhaps the protocol guarantees that, but the socket
> surely doesn't. You may need to look for that terminating
> sequence in the middle of the buffer.
>
>
But isn't that exactly why we set things like autoflush(1) or $|=1?  After
the data stream has been sent from the server (i.e. CRLF.CRLF) the server
stops transmitting data and waits for the next command, so there's no
chance that a second data stream may be received by the client socket, at
least not until the client socket issues a new command.



> Does any of that help?
>
>
I appreciate it, truly.  But no, not really :-(  I can honestly say, been
there, done that.

I realize my problem here is the really whacky way in which the data stream
is encoded (and that is completely out of my control).  But there must be a
adequate and proper way to handle this data.


-- 

Regards,
Chris Knipe

Re: reading from socket

Reply via email to