On Saturday, June 8, 2002, at 08:13 , David T-G wrote:
> drieux, et al --
> ...and then drieux said...
> % On Saturday, June 8, 2002, at 04:47 , David T-G wrote:
[..]
> % >    ...
> % >    chop ; chop ;                          # strip \n\r (no chomp here)
> ...
> %
> % that is way too weird - since that means that what
> % ever dos app you were using was not complying with
> % the standard to begin with.
>
> Tell me about the standard...  Should perl happily chomp either a UNIX or
> a DOS (or even a MAC) line?  Or do I turn around and explain it below,
> answering myself?

the cannon is:

        EOL - end of line is denoted as

                mac: <CR>               : chr(13)
                dos: <CR><NL>   : chr(13)chr(10)
                nix: <NL>               : chr(10)

note what happens:

vladimir: 64:] echo line> file
vladimir: 65:] unix2dos file file.dox
could not open /dev/kbd to get keyboard type US keyboard assumed
could not get keyboard type US keyboard assumed
vladimir: 66:] od -c !$
od -c file.dox
0000000   l   i   n   e  \r  \n
0000006
vladimir: 67:]

if you check the stty man pages you will find our friend onlcr
that does the mapping of NL to CR-NL - we still have the old
cross over problem here that what unix folks use as \n is the
"new line" token - but which by way of stty goes out to their
'terminal type' as if it were CR - or "\r" - return the carriage
head to the beginning of the line and then shift the roller up one.

otherwise if you have merely the new line
                                           you start typing here.

If you have merely the CR - you would start writing over the line.

Hence to have "\n\r" would mean having implemented the stardard
for the EOL token to the file 'underappropriately' - although
'technically literally' and it would 'still work' in the case of
those systems that know how to parse them correctly. Since it
really does not matter to a teletype which order the commands
are generated - they will read them off the wire as commands
and execute them...

{ note you should seend three BEL tokens for the start and stop
of any message - but that has fallen out of habit.... and no one
seems to worry about taking them out of the data stream, or remembering
to put them in either... }

[..]
> (you know, it can be a real challlenge to write a one-liner!) and found
> that I have either RL or L for all files, and no \n\r as I had thought,
[..]

the problem here is that chomp is defined on the host you are on,
not on the host where you once were.....

it's a reasonable compromise in that case...

where you have to get your poop in a group on this point is as you
move into 'network layer plays' - such as HTTP - unless you are
using the appropriate modules to do this stuff for you - and you
find that the RFC for http defines the separator for the head from
the body as <CR><LF> - cf:
http://www.w3.org/Protocols/rfc2068/rfc2068
section 2.2 to be specific - where they call out the decimal
values for them in the ASCII table....

{ may I recommend that you use the CPAN modules - hand cranking this
stuff from the IO::Socket layer - while what some of us did, is not
what I would recommend now.... but yes, the original code I ripped
had the sort of 'oh look, we have that <CR><LF> hence we are out of
header and the rest is body....' sort of coding...}

[..]
> this would have really screwed me as I got way down into my lists :-)

yes... not that I would wish to impose some 'puritanical morality'
on how you relate to yourself..... but in the coding space, I would
wish to impose a sense of

        THAT WILL HURT YOU!

> So now I should be able to put
>
>   ...
>   while(<>)
>   {
>     s/($cr|$lf)+//;
>   ...
>
> into my code and basically make my own chomp, right?  Time to go off and
> test...

test that - but I do not think it will do what you are expecting,
since I think the tradition is

        ([$cr|$lf]+)

where the [ ] block off the sequence of characters, the "|"
here is the expected 'or me' - the "+" denoting one or more of these.
{ in this case you want that, helps the compiler not worry about
looking for the cases of 'not me' and we nest all of that in the
round braces to denote the 'yes, this pattern, do something with it!'

[..]

ciao
drieux

---


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to