Ruben Vorderman added the comment:
ping
--
___
Python tracker
<https://bugs.python.org/issue24301>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from Ruben Vorderman :
def compress(data, compresslevel=_COMPRESS_LEVEL_BEST, *, mtime=None):
"""Compress data in one shot and return the compressed string.
compresslevel sets the compression level in range of 0-9.
mtime can be used to set the modific
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +28622
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/30416
___
Python tracker
<https://bugs.python.org/issu
Ruben Vorderman added the comment:
ping
--
___
Python tracker
<https://bugs.python.org/issue46267>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from Ruben Vorderman :
zlib.compress can now only be used to output zlib blocks.
Arguably `zlib.compress(my_data, level, wbits=-15)` is even more useful as it
gives you a raw deflate block. That is quite interesting if you are writing
your own file format and want to use
New submission from Ruben Vorderman :
When working on python-isal which aims to provide faster drop-in replacements
for the zlib and gzip modules I found that the gzip.compress and
gzip.decompress are suboptimally implemented which hurts performance.
gzip.compress and gzip.decompress both do
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +23768
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/25011
___
Python tracker
<https://bugs.python.org/issu
Change by Ruben Vorderman :
--
type: -> enhancement
___
Python tracker
<https://bugs.python.org/issue43612>
___
___
Python-bugs-list mailing list
Unsubscrib
Change by Ruben Vorderman :
--
type: -> performance
___
Python tracker
<https://bugs.python.org/issue43613>
___
___
Python-bugs-list mailing list
Unsubscrib
New submission from Ruben Vorderman :
This is properly documented:
https://docs.python.org/3/library/gzip.html#gzip.BadGzipFile .
It now hrows EOFErrors when a stream is truncated. But this means that upstream
both BadGzipFile and EOFError need to be catched in the exception handling when
Change by Ruben Vorderman :
--
components: +Extension Modules -Library (Lib)
___
Python tracker
<https://bugs.python.org/issue43612>
___
___
Python-bugs-list m
Change by Ruben Vorderman :
--
type: -> behavior
___
Python tracker
<https://bugs.python.org/issue43621>
___
___
Python-bugs-list mailing list
Unsubscrib
Ruben Vorderman added the comment:
I created bpo-43621 for the error issue. There should only be BadGzipFile. Once
that is fixed, having only one error type will make it easier to implement some
functions that are shared across the gzip.py codebase
Change by Ruben Vorderman :
--
versions: +Python 3.11
___
Python tracker
<https://bugs.python.org/issue43612>
___
___
Python-bugs-list mailing list
Unsubscribe:
Ruben Vorderman added the comment:
A patch was created, but has not been reviewed yet.
--
___
Python tracker
<https://bugs.python.org/issue43612>
___
___
Pytho
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +26386
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/27941
___
Python tracker
<https://bugs.python.org/issu
Change by Ruben Vorderman :
--
pull_requests: +26387
pull_request: https://github.com/python/cpython/pull/27941
___
Python tracker
<https://bugs.python.org/issue43
Ruben Vorderman added the comment:
Thanks for the review, Lukasz! It was fun to create the PR and optimize the
performance for gzip.py as well.
--
___
Python tracker
<https://bugs.python.org/issue43
Change by Ruben Vorderman :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Ruben Vorderman added the comment:
Issue was solved by moving code from _GzipReader to separate functions and
maintaining the same error structure.
This solved the problem with maximum code reuse and full backwards
compatibility.
--
___
Python
New submission from Ruben Vorderman :
Please consider the following code snippet:
import gzip
import sys
with gzip.open(sys.argv[1], "rt") as in_file_h:
with gzip.open(sys.argv[2], "wt", compresslevel=1) as out_file_h:
f
Change by Ruben Vorderman :
--
components: +Library (Lib)
type: -> performance
versions: +Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python
3.9
___
Python tracker
<https://bugs.python.org/issu
New submission from Ruben Vorderman :
A 'struct.error: unpack requires a buffer of 8 bytes' is thrown when a gzip
trailer is truncated instead of an EOFError such as in the 3.10 and prior
releases.
--
components: Library (Lib)
messages: 404165
nosy: rhpvorderman
priori
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +27296
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/29023
___
Python tracker
<https://bugs.python.org/issu
Ruben Vorderman added the comment:
It turns out there is a bug where FNAME and/or FCOMMENT flags are set in the
header, but no error is thrown when NAME and COMMENT fields are missing.
--
___
Python tracker
<https://bugs.python.org/issue45
New submission from Ruben Vorderman :
The following headers are currently allowed while being wrong:
- Headers with FCOMMENT flag set, but with incomplete or missing COMMENT bytes.
- Headers with FNAME flag set, but with incomplete or missing NAME bytes
- Headers with FHCRC set, the crc is
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +27300
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/29028
___
Python tracker
<https://bugs.python.org/issu
Change by Ruben Vorderman :
--
pull_requests: +27301
pull_request: https://github.com/python/cpython/pull/29029
___
Python tracker
<https://bugs.python.org/issue45
Ruben Vorderman added the comment:
bump. This is a regression introduced by
https://github.com/python/cpython/pull/27941
--
___
Python tracker
<https://bugs.python.org/issue45
Ruben Vorderman added the comment:
Bump. This is a bug that allows corrupted gzip files to be processed without
error. Therefore I bump this issue in the hopes someone will review the PR.
--
___
Python tracker
<https://bugs.python.org/issue45
Ruben Vorderman added the comment:
Ping
--
___
Python tracker
<https://bugs.python.org/issue45509>
___
___
Python-bugs-list mailing list
Unsubscribe:
Ruben Vorderman added the comment:
1. Quite a lot
I tested it for the two most common use case.
import timeit
import statistics
WITH_FNAME = """
from gzip import GzipFile, decompress
import io
fileobj = io.BytesIO()
g = GzipFile(fileobj=fileobj, mode='wb', fi
Ruben Vorderman added the comment:
I increased the performance of the patch. I added the file used for
benchmarking. I also test the FHCRC changes now.
The benchmark tests headers with different flags concatenated to a DEFLATE
block with no data and a gzip trailer. The data is fed to
New submission from Ruben Vorderman :
The current implementation uses a lot of bytestring slicing. While it is much
better than the 3.10 and earlier implementations, it can still be further
improved by using memoryviews instead.
Possibly. I will check this out.
--
components
Ruben Vorderman added the comment:
Tried and failed. It seems that the overhead of creating a new memoryview
object beats the performance gained by it.
--
___
Python tracker
<https://bugs.python.org/issue45
Ruben Vorderman added the comment:
I have found that using the timeit module provides more precise measurements:
For a simple gzip header. (As returned by gzip.compress or zlib.compress with
wbits=31)
./python -m timeit -s "import io; data =
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x
New submission from Ruben Vorderman :
Python now uses the excellent timsort for most (all?) of its sorting. But this
is not the fastest sort available for one particular use case.
If the number of possible values in the array is limited, it is possible to
perform a counting sort: https
Ruben Vorderman added the comment:
Also I didn't know if this should be in Component C-API or Interpreter Core.
But I guess this will be implemented as C-API calls PyBytes_Sort and
PyByteArray_SortInplace so I figured C-API is the correct component
Ruben Vorderman added the comment:
I changed the cython script a bit to use a more naive implementation without
memset.
Now it is always significantly faster than bytes(sorted(my_bytes)).
$ python -m timeit -c "from bytes_sort import bytes_sort" "bytes_sort(b'')"
Ruben Vorderman added the comment:
Sorry for the spam. I see I made a typo in the timeit script. Next time I will
be more dilligent when making these kinds of reports and triple checking it
before hand, and sending it once. I used -c instead of -s and now all the setup
time is also included
Ruben Vorderman added the comment:
I used it for the median calculation of FASTQ quality scores
(https://en.wikipedia.org/wiki/FASTQ_format). But in the end I used the
frequency table to calculate the median more quickly. So as you say, the
frequency table turned out to be more useful
Ruben Vorderman added the comment:
>From the spec:
https://datatracker.ietf.org/doc/html/rfc1952
2.2. File format
A gzip file consists of a series of "members" (compressed data
sets). The format of each member is specified in the following
section. The m
Change by Ruben Vorderman :
--
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/issue45875>
___
___
Python-bugs-list
Ruben Vorderman added the comment:
Whoops. Sorry, I spoke before my turn. If gzip implements it, it seems only
logical that python's *gzip* module should too.
I believe it can be fixed quite easily. The code should raise a warning though.
I will make
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +28076
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/29847
___
Python tracker
<https://bugs.python.org/issu
New submission from Ruben Vorderman :
`Python -m gzip -d myfile` will throw an error because myfile does not end in
'.gz'. That is fair (even though a bit redundant, GzipFile contains a header
check, so why bother checking the extension?).
The problem is how this error is thrown.
1
New submission from Ruben Vorderman :
python -m gzip reads in chunks of 1024 bytes:
https://github.com/python/cpython/blob/1f433406bd46fbd00b88223ad64daea6bc9eaadc/Lib/gzip.py#L599
This hurts performance somewhat. Using io.DEFAULT_BUFFER_SIZE will improve it.
Also 'io.DEFAULT_BUFFER_SIZ
Change by Ruben Vorderman :
--
type: -> behavior
___
Python tracker
<https://bugs.python.org/issue43316>
___
___
Python-bugs-list mailing list
Unsubscrib
Change by Ruben Vorderman :
--
type: -> performance
___
Python tracker
<https://bugs.python.org/issue43317>
___
___
Python-bugs-list mailing list
Unsubscrib
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +23430
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/24645
___
Python tracker
<https://bugs.python.org/issu
Ruben Vorderman added the comment:
That sounds perfect, I didn't think of that. I will make a PR.
--
___
Python tracker
<https://bugs.python.org/is
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +23432
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/24647
___
Python tracker
<https://bugs.python.org/issu
New submission from Ruben Vorderman :
The gzip file format is quite ubiquitous and so is its first (?) free/libre
implementation zlib with the gzip command line tool. This uses the DEFLATE
algorithm.
Lately some faster algorithms (most notable zstd) have popped up which have
better speed
Ruben Vorderman added the comment:
This has to be in a PEP. I am sorry I missplaced it on the bugtracker.
--
resolution: -> not a bug
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Ruben Vorderman added the comment:
nasm or yasm will work. I only have experience building it with nasm.
But yes that is indeed a dependency. Personally I do not see the problem with
adding nasm as a build dependency, as it opens up possibilities for even more
performance optimizations in
Ruben Vorderman added the comment:
> That might be an option then. CPython could use the existing library if it is
> available.
Dynamic linking indeed seems like a great option here! Users who care about
this will probably have the 'isal' and 'libdeflateO' pa
Ruben Vorderman added the comment:
I just find out that libdeflate does not support streaming:
https://github.com/ebiggers/libdeflate/issues/73 . I should have read the
manual better.
So that explains the memory usage. Because of that I don't think it is suitable
for usage in CP
New submission from Ruben Vorderman :
Pipes block if reading from an empty pipe or when writing to a full pipe. When
this happens the program waiting for the pipe still uses a lot of CPU cycles
when waiting for the pipe to stop blocking.
I found this while working with xopen. A library that
Change by Ruben Vorderman :
--
keywords: +patch
pull_requests: +21035
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/21921
___
Python tracker
<https://bugs.python.org/issu
Ruben Vorderman added the comment:
> Within the stdlib, I'd focus only on using things that can be used in a 100%
> api compatible way with the existing modules.
> Otherwise creating a new module and putting it up on PyPI to expose the
> functionality from the libraries you
Ruben Vorderman added the comment:
> If you take this route, please don't write it directly against the CPython
> C-API (as you would for a CPython stdlib module).
Thanks for reminding me of this. I was planning to take the laziest route
possible anyway, reusing as much code from
Ruben Vorderman added the comment:
Hi, thanks all for the comments and the help.
I have created the bindings using Cython. The project is still a work in
progress as of this moment. I leave the link here for future reference.
Special thanks for the Cython developers for enabling these
62 matches
Mail list logo