Re: tail

2022-05-09 Thread Greg Ewing

On 9/05/22 7:47 am, Marco Sulla wrote:

It will fail if the contents is not ASCII.


Why?


For some encodings, if you seek to an arbitrary byte position and
then read, it may *appear* to succeed but give you complete gibberish.

Your method might work for a certain subset of encodings (those that
are self-synchronising) but it won't work for arbitrary encodings.

Given that limitation, I don't think it's reliable enough to include
in the standard library.

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Dennis Lee Bieber
On Sun, 8 May 2022 22:48:32 +0200, Marco Sulla
 declaimed the following:

>
>Emh. I re-quote
>
>seek(offset, whence=SEEK_SET)
>Change the stream position to the given byte offset.
>
>And so on. No mention of differences between text and binary mode.

You ignore that, underneath, Python is just wrapping the C API... And
the documentation for C explicitly specifies that other then SEEK_END with
offset 0, and SEEK_SET with offset of 0, for a text file one can only rely
upon SEEK_SET using an offset previously obtained with (C) ftell() /
(Python) .tell() .

https://docs.python.org/3/library/io.html
"""
class io.IOBase

The abstract base class for all I/O classes.
"""
 seek(offset, whence=SEEK_SET)

Change the stream position to the given byte offset. offset is
interpreted relative to the position indicated by whence. The default value
for whence is SEEK_SET. Values for whence are:
"""

Applicable to BINARY MODE I/O: For UTF-8 and any other multibyte
encoding, this means you could end up positioning into the middle of a
"character" and subsequently read garbage. It is on you to handle
synchronizing on a valid character position, and also to handle different
line ending conventions.

"""
class io.TextIOBase

Base class for text streams. This class provides a character and line
based interface to stream I/O. It inherits IOBase.
"""
 seek(offset, whence=SEEK_SET)

Change the stream position to the given offset. Behaviour depends on
the whence parameter. The default value for whence is SEEK_SET.

SEEK_SET or 0: seek from the start of the stream (the default);
offset must either be a number returned by TextIOBase.tell(), or zero. Any
other offset value produces undefined behaviour.

SEEK_CUR or 1: “seek” to the current position; offset must be zero,
which is a no-operation (all other values are unsupported).

SEEK_END or 2: seek to the end of the stream; offset must be zero
(all other values are unsupported).
"""

EMPHASIS: "offset must either be a number returned by TextIOBase.tell(), or
zero." 

TEXT I/O, with a specified encoding, will return Unicode data points,
and will handle converting line ending to the internal ( represents
new-line) format.

Since your code does not specify BINARY mode in the open statement,
Python should be using TEXT mode.



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


usb tv stick and python

2022-05-09 Thread jak

Hello everybody,
I usually use vlc to watch tv and I use the w_scan program on linux to
create a file (.m3u) with the list of available channels. Unfortunately
I can't find an alternative to w_scan for Windows and I was wondering if
you could tell me some python library that allows me, easily, to
interface with the device and get the channel list.

Thanks in advance for your attention.
--
https://mail.python.org/mailman/listinfo/python-list


Re: usb tv stick and python

2022-05-09 Thread Dennis Lee Bieber
On Mon, 9 May 2022 08:47:50 +0200, jak  declaimed the
following:

>Hello everybody,
>I usually use vlc to watch tv and I use the w_scan program on linux to
>create a file (.m3u) with the list of available channels. Unfortunately
>I can't find an alternative to w_scan for Windows and I was wondering if
>you could tell me some python library that allows me, easily, to
>interface with the device and get the channel list.
>

UNTESTED... But if it works means no change to your procedures...

Presuming you are using W10 or later... 

Activate the Windows Subsystem for Linux ([old style] Control Panel /
Programs / Programs and Features... Turn Windows Features On or Off...
Scroll down, it's the third from the bottom)

Windows "Microsoft Store"; Search "Debian"; Download/Install (set a
log-in account/password). (Unfortunately, it's not the most recent version
-- still on Buster...)

Open Debian instance console (to my knowledge, no graphical
applications are supported); do normal stuff to bring apt up-to-date.

-=-=-
wulfraed@ElusiveUnicorn:~$ apt search w_scan
Sorting... Done
Full Text Search... Done
w-scan/oldstable 20170107-2 amd64
  Channel scanning tool for DVB and ATSC channels

wulfraed@ElusiveUnicorn:~$
-=-=-

Install and test. Windows partitions are accessible as
/mnt/



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: usb tv stick and python

2022-05-09 Thread jak

Il 09/05/2022 16:28, Dennis Lee Bieber ha scritto:

On Mon, 9 May 2022 08:47:50 +0200, jak  declaimed the
following:


Hello everybody,
I usually use vlc to watch tv and I use the w_scan program on linux to
create a file (.m3u) with the list of available channels. Unfortunately
I can't find an alternative to w_scan for Windows and I was wondering if
you could tell me some python library that allows me, easily, to
interface with the device and get the channel list.



UNTESTED... But if it works means no change to your procedures...

Presuming you are using W10 or later...

Activate the Windows Subsystem for Linux ([old style] Control Panel /
Programs / Programs and Features... Turn Windows Features On or Off...
Scroll down, it's the third from the bottom)

Windows "Microsoft Store"; Search "Debian"; Download/Install (set a
log-in account/password). (Unfortunately, it's not the most recent version
-- still on Buster...)

Open Debian instance console (to my knowledge, no graphical
applications are supported); do normal stuff to bring apt up-to-date.

-=-=-
wulfraed@ElusiveUnicorn:~$ apt search w_scan
Sorting... Done
Full Text Search... Done
w-scan/oldstable 20170107-2 amd64
   Channel scanning tool for DVB and ATSC channels

wulfraed@ElusiveUnicorn:~$
-=-=-

Install and test. Windows partitions are accessible as
/mnt/





First of all, thank you for your reply. Actually I already have a handy
work around to use w_scan because I have a VM with linux (ubuntu)
installed. I was just looking for a python package/library that would
allow me to write a wrapper around. I would also be satisfied with
finding documentation that describes the protocol to communicate with
the dvb interface and be able to write an app that produces the list of
available channels. Any advice, suggestion or pointing is welcome.

--
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Marco Sulla
On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
>
> The point here is that text is a very different thing. Because you
> cannot seek to an absolute number of characters in an encoding with
> variable sized characters. _If_ you did a seek to an arbitrary number
> you can end up in the middle of some character. And there are encodings
> where you cannot inspect the data to find a character boundary in the
> byte stream.

Ooook, now I understand what you and Barry mean. I suppose there's no
reliable way to tail a big file opened in text mode with a decent performance.

Anyway, the previous-previous function I posted worked only for files
opened in binary mode, and I suppose it's reliable, since it searches
only for b"\n", as readline() in binary mode do.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Chris Angelico
On Tue, 10 May 2022 at 03:47, Marco Sulla  wrote:
>
> On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> >
> > The point here is that text is a very different thing. Because you
> > cannot seek to an absolute number of characters in an encoding with
> > variable sized characters. _If_ you did a seek to an arbitrary number
> > you can end up in the middle of some character. And there are encodings
> > where you cannot inspect the data to find a character boundary in the
> > byte stream.
>
> Ooook, now I understand what you and Barry mean. I suppose there's no
> reliable way to tail a big file opened in text mode with a decent performance.
>
> Anyway, the previous-previous function I posted worked only for files
> opened in binary mode, and I suppose it's reliable, since it searches
> only for b"\n", as readline() in binary mode do.

It's still fundamentally impossible to solve this in a general way, so
the best way to do things will always be to code for *your* specific
use-case. That means that this doesn't belong in the stdlib or core
language, but in your own toolkit.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread 2QdxY4RzWzUUiLuE
On 2022-05-08 at 18:52:42 +,
Stefan Ram  wrote:

>   Remember how recently people here talked about how you cannot copy
>   text from a video? Then, how did I do it? Turns out, for my
>   operating system, there's a screen OCR program! So I did this OCR
>   and then manually corrected a few wrong characters, and was done!

When you're learning, and the example you tried doesn't work like it
worked on the video, you probably don't know what's wrong, let alone how
to correct it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: usb tv stick and python

2022-05-09 Thread Dennis Lee Bieber
On Mon, 9 May 2022 17:56:32 +0200, jak  declaimed the
following:

>First of all, thank you for your reply. Actually I already have a handy
>work around to use w_scan because I have a VM with linux (ubuntu)
>installed. I was just looking for a python package/library that would
>allow me to write a wrapper around. I would also be satisfied with
>finding documentation that describes the protocol to communicate with
>the dvb interface and be able to write an app that produces the list of
>available channels. Any advice, suggestion or pointing is welcome.

For the protocol... You might need to locate the source code for
w_scan. Perhaps https://github.com/tbsdtv/w_scan



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Marco Sulla
On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
>
> On Tue, 10 May 2022 at 03:47, Marco Sulla  
> wrote:
> >
> > On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> > >
> > > The point here is that text is a very different thing. Because you
> > > cannot seek to an absolute number of characters in an encoding with
> > > variable sized characters. _If_ you did a seek to an arbitrary number
> > > you can end up in the middle of some character. And there are encodings
> > > where you cannot inspect the data to find a character boundary in the
> > > byte stream.
> >
> > Ooook, now I understand what you and Barry mean. I suppose there's no
> > reliable way to tail a big file opened in text mode with a decent 
> > performance.
> >
> > Anyway, the previous-previous function I posted worked only for files
> > opened in binary mode, and I suppose it's reliable, since it searches
> > only for b"\n", as readline() in binary mode do.
>
> It's still fundamentally impossible to solve this in a general way, so
> the best way to do things will always be to code for *your* specific
> use-case. That means that this doesn't belong in the stdlib or core
> language, but in your own toolkit.

Nevertheless, tail is a fundamental tool in *nix. It's fast and
reliable. Also the tail command can't handle different encodings?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Chris Angelico
On Tue, 10 May 2022 at 05:12, Marco Sulla  wrote:
>
> On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
> >
> > On Tue, 10 May 2022 at 03:47, Marco Sulla  
> > wrote:
> > >
> > > On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> > > >
> > > > The point here is that text is a very different thing. Because you
> > > > cannot seek to an absolute number of characters in an encoding with
> > > > variable sized characters. _If_ you did a seek to an arbitrary number
> > > > you can end up in the middle of some character. And there are encodings
> > > > where you cannot inspect the data to find a character boundary in the
> > > > byte stream.
> > >
> > > Ooook, now I understand what you and Barry mean. I suppose there's no
> > > reliable way to tail a big file opened in text mode with a decent 
> > > performance.
> > >
> > > Anyway, the previous-previous function I posted worked only for files
> > > opened in binary mode, and I suppose it's reliable, since it searches
> > > only for b"\n", as readline() in binary mode do.
> >
> > It's still fundamentally impossible to solve this in a general way, so
> > the best way to do things will always be to code for *your* specific
> > use-case. That means that this doesn't belong in the stdlib or core
> > language, but in your own toolkit.
>
> Nevertheless, tail is a fundamental tool in *nix. It's fast and
> reliable. Also the tail command can't handle different encodings?

Like most Unix programs, it handles bytes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Barry


> On 9 May 2022, at 17:41, r...@zedat.fu-berlin.de wrote:
> 
> Barry Scott  writes:
>> Why use tiny chunks? You can read 4KiB as fast as 100 bytes
> 
>  When optimizing code, it helps to be aware of the orders of
>  magnitude

That is true and we’ll know to me, now show how what I said is wrong.

The os is going to DMA at least 4k, with read ahead more like 64k.
So I can get that into the python memory at the same scale of time as
1 byte because it’s the setup of the I/O that is expensive not the bytes
transferred.

Barry

> . Code that is more cache-friendly is faster, that is,
>  code that holds data in single region of memory and that uses
>  regular patterns of access. Chandler Carruth talked about this,
>  and I made some notes when watching the video of his talk:
> 
> CPUS HAVE A HIERARCHICAL CACHE SYSTEM
> (from a 2014 talk by Chandler Carruth)
> 
> One cycle on a 3 GHz processor1   ns
> L1 cache reference0.5 ns
> Branch mispredict 5   ns
> L2 cache reference7   ns  14x L1 cache
> Mutex lock/unlock25   ns
> Main memory reference   100   ns  20xL2, 200xL1
> Compress 1K bytes with Snappy 3,000   ns
> Send 1K bytes over 1 Gbps network10,000   ns  0.01 ms
> Read 4K randomly from SSD   150,000   ns  0.15 ms
> Read 1 MB sequentially from memory  250,000   ns  0.25 ms
> Round trip within same datacenter   500,000   ns   0.5 ms
> Read 1 MB sequentially From SSD   1,000,000   ns   1   ms 4x memory
> Disk seek10,000,000   ns  10   ms 20xdatacen. RT
> Read 1 MB sequentially from disk 20,000,000   ns  20   ms 80xmem.,20xSSD
> Send packet CA->Netherlands->CA 150,000,000   ns 150   ms
> 
>  . Remember how recently people here talked about how you cannot
>  copy text from a video? Then, how did I do it? Turns out, for my 
>  operating system, there's a screen OCR program! So I did this OCR 
>  and then manually corrected a few wrong characters, and was done!
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Barry


> On 9 May 2022, at 20:14, Marco Sulla  wrote:
> 
> On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
>> 
>>> On Tue, 10 May 2022 at 03:47, Marco Sulla  
>>> wrote:
>>> 
>>> On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
 
 The point here is that text is a very different thing. Because you
 cannot seek to an absolute number of characters in an encoding with
 variable sized characters. _If_ you did a seek to an arbitrary number
 you can end up in the middle of some character. And there are encodings
 where you cannot inspect the data to find a character boundary in the
 byte stream.
>>> 
>>> Ooook, now I understand what you and Barry mean. I suppose there's no
>>> reliable way to tail a big file opened in text mode with a decent 
>>> performance.
>>> 
>>> Anyway, the previous-previous function I posted worked only for files
>>> opened in binary mode, and I suppose it's reliable, since it searches
>>> only for b"\n", as readline() in binary mode do.
>> 
>> It's still fundamentally impossible to solve this in a general way, so
>> the best way to do things will always be to code for *your* specific
>> use-case. That means that this doesn't belong in the stdlib or core
>> language, but in your own toolkit.
> 
> Nevertheless, tail is a fundamental tool in *nix. It's fast and
> reliable. Also the tail command can't handle different encodings?

POSIX tail just prints the bytes to the output that it finds between \n bytes.
At no time does it need to care about encodings as that is a problem solved
by the terminal software. I would not expect utf-16 to work with tail on
linux systems.

You could always get the source of tail and read It’s implementation.

Barry

> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Chris Angelico
On Tue, 10 May 2022 at 07:07, Barry  wrote:
> POSIX tail just prints the bytes to the output that it finds between \n bytes.
> At no time does it need to care about encodings as that is a problem solved
> by the terminal software. I would not expect utf-16 to work with tail on
> linux systems.

UTF-16 ASCII seems to work fine on my system, which probably means the
terminal is just ignoring all the NUL bytes. But if there's a random
0x0A anywhere, it would probably be counted as a line break.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Dennis Lee Bieber
On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla
 declaimed the following:

>Nevertheless, tail is a fundamental tool in *nix. It's fast and
>reliable. Also the tail command can't handle different encodings?

Based upon
https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY
thing tail looks at is single byte "\n". It does not handle other line
endings, and appears to performs BINARY I/O, not text I/O. It does nothing
for bytes that are not "\n". Split multi-byte encodings are irrelevant
since, if it does not find enough "\n" bytes in the buffer (chunk) it reads
another binary chunk and seeks for additional "\n" bytes. Once it finds the
desired amount, it is synchronized on the byte following the "\n" (which,
for multi-byte encodings might be a NUL, but in any event, should be a safe
location for subsequent I/O).

Interpretation of encoding appears to fall to the console driver
configuration when displaying the bytes output by tail.


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-09 Thread Alan Bawden
Marco Sulla  writes:

   On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
   ...
   Nevertheless, tail is a fundamental tool in *nix. It's fast and
   reliable. Also the tail command can't handle different encodings?

It definitely can't.  It works for UTF-8, and all the ASCII compatible
single byte encodings, but feed it a file encoded in UTF-16, and it will
sometimes screw up.  (And if you don't redirect the output away from
your terminal, and your terminal encoding isn't also set to UTF-16, you
will likely find yourself looking at gibberish -- but that's another
problem...)
-- 
https://mail.python.org/mailman/listinfo/python-list