Re: tail
On 9/05/22 7:47 am, Marco Sulla wrote: It will fail if the contents is not ASCII. Why? For some encodings, if you seek to an arbitrary byte position and then read, it may *appear* to succeed but give you complete gibberish. Your method might work for a certain subset of encodings (those that are self-synchronising) but it won't work for arbitrary encodings. Given that limitation, I don't think it's reliable enough to include in the standard library. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Sun, 8 May 2022 22:48:32 +0200, Marco Sulla declaimed the following: > >Emh. I re-quote > >seek(offset, whence=SEEK_SET) >Change the stream position to the given byte offset. > >And so on. No mention of differences between text and binary mode. You ignore that, underneath, Python is just wrapping the C API... And the documentation for C explicitly specifies that other then SEEK_END with offset 0, and SEEK_SET with offset of 0, for a text file one can only rely upon SEEK_SET using an offset previously obtained with (C) ftell() / (Python) .tell() . https://docs.python.org/3/library/io.html """ class io.IOBase The abstract base class for all I/O classes. """ seek(offset, whence=SEEK_SET) Change the stream position to the given byte offset. offset is interpreted relative to the position indicated by whence. The default value for whence is SEEK_SET. Values for whence are: """ Applicable to BINARY MODE I/O: For UTF-8 and any other multibyte encoding, this means you could end up positioning into the middle of a "character" and subsequently read garbage. It is on you to handle synchronizing on a valid character position, and also to handle different line ending conventions. """ class io.TextIOBase Base class for text streams. This class provides a character and line based interface to stream I/O. It inherits IOBase. """ seek(offset, whence=SEEK_SET) Change the stream position to the given offset. Behaviour depends on the whence parameter. The default value for whence is SEEK_SET. SEEK_SET or 0: seek from the start of the stream (the default); offset must either be a number returned by TextIOBase.tell(), or zero. Any other offset value produces undefined behaviour. SEEK_CUR or 1: “seek” to the current position; offset must be zero, which is a no-operation (all other values are unsupported). SEEK_END or 2: seek to the end of the stream; offset must be zero (all other values are unsupported). """ EMPHASIS: "offset must either be a number returned by TextIOBase.tell(), or zero." TEXT I/O, with a specified encoding, will return Unicode data points, and will handle converting line ending to the internal ( represents new-line) format. Since your code does not specify BINARY mode in the open statement, Python should be using TEXT mode. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
usb tv stick and python
Hello everybody, I usually use vlc to watch tv and I use the w_scan program on linux to create a file (.m3u) with the list of available channels. Unfortunately I can't find an alternative to w_scan for Windows and I was wondering if you could tell me some python library that allows me, easily, to interface with the device and get the channel list. Thanks in advance for your attention. -- https://mail.python.org/mailman/listinfo/python-list
Re: usb tv stick and python
On Mon, 9 May 2022 08:47:50 +0200, jak declaimed the following: >Hello everybody, >I usually use vlc to watch tv and I use the w_scan program on linux to >create a file (.m3u) with the list of available channels. Unfortunately >I can't find an alternative to w_scan for Windows and I was wondering if >you could tell me some python library that allows me, easily, to >interface with the device and get the channel list. > UNTESTED... But if it works means no change to your procedures... Presuming you are using W10 or later... Activate the Windows Subsystem for Linux ([old style] Control Panel / Programs / Programs and Features... Turn Windows Features On or Off... Scroll down, it's the third from the bottom) Windows "Microsoft Store"; Search "Debian"; Download/Install (set a log-in account/password). (Unfortunately, it's not the most recent version -- still on Buster...) Open Debian instance console (to my knowledge, no graphical applications are supported); do normal stuff to bring apt up-to-date. -=-=- wulfraed@ElusiveUnicorn:~$ apt search w_scan Sorting... Done Full Text Search... Done w-scan/oldstable 20170107-2 amd64 Channel scanning tool for DVB and ATSC channels wulfraed@ElusiveUnicorn:~$ -=-=- Install and test. Windows partitions are accessible as /mnt/ -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: usb tv stick and python
Il 09/05/2022 16:28, Dennis Lee Bieber ha scritto: On Mon, 9 May 2022 08:47:50 +0200, jak declaimed the following: Hello everybody, I usually use vlc to watch tv and I use the w_scan program on linux to create a file (.m3u) with the list of available channels. Unfortunately I can't find an alternative to w_scan for Windows and I was wondering if you could tell me some python library that allows me, easily, to interface with the device and get the channel list. UNTESTED... But if it works means no change to your procedures... Presuming you are using W10 or later... Activate the Windows Subsystem for Linux ([old style] Control Panel / Programs / Programs and Features... Turn Windows Features On or Off... Scroll down, it's the third from the bottom) Windows "Microsoft Store"; Search "Debian"; Download/Install (set a log-in account/password). (Unfortunately, it's not the most recent version -- still on Buster...) Open Debian instance console (to my knowledge, no graphical applications are supported); do normal stuff to bring apt up-to-date. -=-=- wulfraed@ElusiveUnicorn:~$ apt search w_scan Sorting... Done Full Text Search... Done w-scan/oldstable 20170107-2 amd64 Channel scanning tool for DVB and ATSC channels wulfraed@ElusiveUnicorn:~$ -=-=- Install and test. Windows partitions are accessible as /mnt/ First of all, thank you for your reply. Actually I already have a handy work around to use w_scan because I have a VM with linux (ubuntu) installed. I was just looking for a python package/library that would allow me to write a wrapper around. I would also be satisfied with finding documentation that describes the protocol to communicate with the dvb interface and be able to write an app that produces the list of available channels. Any advice, suggestion or pointing is welcome. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > The point here is that text is a very different thing. Because you > cannot seek to an absolute number of characters in an encoding with > variable sized characters. _If_ you did a seek to an arbitrary number > you can end up in the middle of some character. And there are encodings > where you cannot inspect the data to find a character boundary in the > byte stream. Ooook, now I understand what you and Barry mean. I suppose there's no reliable way to tail a big file opened in text mode with a decent performance. Anyway, the previous-previous function I posted worked only for files opened in binary mode, and I suppose it's reliable, since it searches only for b"\n", as readline() in binary mode do. -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Tue, 10 May 2022 at 03:47, Marco Sulla wrote: > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > The point here is that text is a very different thing. Because you > > cannot seek to an absolute number of characters in an encoding with > > variable sized characters. _If_ you did a seek to an arbitrary number > > you can end up in the middle of some character. And there are encodings > > where you cannot inspect the data to find a character boundary in the > > byte stream. > > Ooook, now I understand what you and Barry mean. I suppose there's no > reliable way to tail a big file opened in text mode with a decent performance. > > Anyway, the previous-previous function I posted worked only for files > opened in binary mode, and I suppose it's reliable, since it searches > only for b"\n", as readline() in binary mode do. It's still fundamentally impossible to solve this in a general way, so the best way to do things will always be to code for *your* specific use-case. That means that this doesn't belong in the stdlib or core language, but in your own toolkit. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On 2022-05-08 at 18:52:42 +, Stefan Ram wrote: > Remember how recently people here talked about how you cannot copy > text from a video? Then, how did I do it? Turns out, for my > operating system, there's a screen OCR program! So I did this OCR > and then manually corrected a few wrong characters, and was done! When you're learning, and the example you tried doesn't work like it worked on the video, you probably don't know what's wrong, let alone how to correct it. -- https://mail.python.org/mailman/listinfo/python-list
Re: usb tv stick and python
On Mon, 9 May 2022 17:56:32 +0200, jak declaimed the following: >First of all, thank you for your reply. Actually I already have a handy >work around to use w_scan because I have a VM with linux (ubuntu) >installed. I was just looking for a python package/library that would >allow me to write a wrapper around. I would also be satisfied with >finding documentation that describes the protocol to communicate with >the dvb interface and be able to write an app that produces the list of >available channels. Any advice, suggestion or pointing is welcome. For the protocol... You might need to locate the source code for w_scan. Perhaps https://github.com/tbsdtv/w_scan -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: > > On Tue, 10 May 2022 at 03:47, Marco Sulla > wrote: > > > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > > > The point here is that text is a very different thing. Because you > > > cannot seek to an absolute number of characters in an encoding with > > > variable sized characters. _If_ you did a seek to an arbitrary number > > > you can end up in the middle of some character. And there are encodings > > > where you cannot inspect the data to find a character boundary in the > > > byte stream. > > > > Ooook, now I understand what you and Barry mean. I suppose there's no > > reliable way to tail a big file opened in text mode with a decent > > performance. > > > > Anyway, the previous-previous function I posted worked only for files > > opened in binary mode, and I suppose it's reliable, since it searches > > only for b"\n", as readline() in binary mode do. > > It's still fundamentally impossible to solve this in a general way, so > the best way to do things will always be to code for *your* specific > use-case. That means that this doesn't belong in the stdlib or core > language, but in your own toolkit. Nevertheless, tail is a fundamental tool in *nix. It's fast and reliable. Also the tail command can't handle different encodings? -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Tue, 10 May 2022 at 05:12, Marco Sulla wrote: > > On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: > > > > On Tue, 10 May 2022 at 03:47, Marco Sulla > > wrote: > > > > > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > > > > > The point here is that text is a very different thing. Because you > > > > cannot seek to an absolute number of characters in an encoding with > > > > variable sized characters. _If_ you did a seek to an arbitrary number > > > > you can end up in the middle of some character. And there are encodings > > > > where you cannot inspect the data to find a character boundary in the > > > > byte stream. > > > > > > Ooook, now I understand what you and Barry mean. I suppose there's no > > > reliable way to tail a big file opened in text mode with a decent > > > performance. > > > > > > Anyway, the previous-previous function I posted worked only for files > > > opened in binary mode, and I suppose it's reliable, since it searches > > > only for b"\n", as readline() in binary mode do. > > > > It's still fundamentally impossible to solve this in a general way, so > > the best way to do things will always be to code for *your* specific > > use-case. That means that this doesn't belong in the stdlib or core > > language, but in your own toolkit. > > Nevertheless, tail is a fundamental tool in *nix. It's fast and > reliable. Also the tail command can't handle different encodings? Like most Unix programs, it handles bytes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
> On 9 May 2022, at 17:41, r...@zedat.fu-berlin.de wrote: > > Barry Scott writes: >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes > > When optimizing code, it helps to be aware of the orders of > magnitude That is true and we’ll know to me, now show how what I said is wrong. The os is going to DMA at least 4k, with read ahead more like 64k. So I can get that into the python memory at the same scale of time as 1 byte because it’s the setup of the I/O that is expensive not the bytes transferred. Barry > . Code that is more cache-friendly is faster, that is, > code that holds data in single region of memory and that uses > regular patterns of access. Chandler Carruth talked about this, > and I made some notes when watching the video of his talk: > > CPUS HAVE A HIERARCHICAL CACHE SYSTEM > (from a 2014 talk by Chandler Carruth) > > One cycle on a 3 GHz processor1 ns > L1 cache reference0.5 ns > Branch mispredict 5 ns > L2 cache reference7 ns 14x L1 cache > Mutex lock/unlock25 ns > Main memory reference 100 ns 20xL2, 200xL1 > Compress 1K bytes with Snappy 3,000 ns > Send 1K bytes over 1 Gbps network10,000 ns 0.01 ms > Read 4K randomly from SSD 150,000 ns 0.15 ms > Read 1 MB sequentially from memory 250,000 ns 0.25 ms > Round trip within same datacenter 500,000 ns 0.5 ms > Read 1 MB sequentially From SSD 1,000,000 ns 1 ms 4x memory > Disk seek10,000,000 ns 10 ms 20xdatacen. RT > Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80xmem.,20xSSD > Send packet CA->Netherlands->CA 150,000,000 ns 150 ms > > . Remember how recently people here talked about how you cannot > copy text from a video? Then, how did I do it? Turns out, for my > operating system, there's a screen OCR program! So I did this OCR > and then manually corrected a few wrong characters, and was done! > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
> On 9 May 2022, at 20:14, Marco Sulla wrote: > > On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: >> >>> On Tue, 10 May 2022 at 03:47, Marco Sulla >>> wrote: >>> >>> On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: The point here is that text is a very different thing. Because you cannot seek to an absolute number of characters in an encoding with variable sized characters. _If_ you did a seek to an arbitrary number you can end up in the middle of some character. And there are encodings where you cannot inspect the data to find a character boundary in the byte stream. >>> >>> Ooook, now I understand what you and Barry mean. I suppose there's no >>> reliable way to tail a big file opened in text mode with a decent >>> performance. >>> >>> Anyway, the previous-previous function I posted worked only for files >>> opened in binary mode, and I suppose it's reliable, since it searches >>> only for b"\n", as readline() in binary mode do. >> >> It's still fundamentally impossible to solve this in a general way, so >> the best way to do things will always be to code for *your* specific >> use-case. That means that this doesn't belong in the stdlib or core >> language, but in your own toolkit. > > Nevertheless, tail is a fundamental tool in *nix. It's fast and > reliable. Also the tail command can't handle different encodings? POSIX tail just prints the bytes to the output that it finds between \n bytes. At no time does it need to care about encodings as that is a problem solved by the terminal software. I would not expect utf-16 to work with tail on linux systems. You could always get the source of tail and read It’s implementation. Barry > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Tue, 10 May 2022 at 07:07, Barry wrote: > POSIX tail just prints the bytes to the output that it finds between \n bytes. > At no time does it need to care about encodings as that is a problem solved > by the terminal software. I would not expect utf-16 to work with tail on > linux systems. UTF-16 ASCII seems to work fine on my system, which probably means the terminal is just ignoring all the NUL bytes. But if there's a random 0x0A anywhere, it would probably be counted as a line break. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla declaimed the following: >Nevertheless, tail is a fundamental tool in *nix. It's fast and >reliable. Also the tail command can't handle different encodings? Based upon https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY thing tail looks at is single byte "\n". It does not handle other line endings, and appears to performs BINARY I/O, not text I/O. It does nothing for bytes that are not "\n". Split multi-byte encodings are irrelevant since, if it does not find enough "\n" bytes in the buffer (chunk) it reads another binary chunk and seeks for additional "\n" bytes. Once it finds the desired amount, it is synchronized on the byte following the "\n" (which, for multi-byte encodings might be a NUL, but in any event, should be a safe location for subsequent I/O). Interpretation of encoding appears to fall to the console driver configuration when displaying the bytes output by tail. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: tail
Marco Sulla writes: On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: ... Nevertheless, tail is a fundamental tool in *nix. It's fast and reliable. Also the tail command can't handle different encodings? It definitely can't. It works for UTF-8, and all the ASCII compatible single byte encodings, but feed it a file encoded in UTF-16, and it will sometimes screw up. (And if you don't redirect the output away from your terminal, and your terminal encoding isn't also set to UTF-16, you will likely find yourself looking at gibberish -- but that's another problem...) -- https://mail.python.org/mailman/listinfo/python-list