I'm seeing slow write speeds from both Python and C code on some Windows
workstations. In particular both Python "write" and numpy "tofile" method
suffers from this issue. I'm wondering if anyone has any ideas regarding if
this is a known issue, know the cause, or how to resolve the issue? The
details are below.
The slow write speed issue seems to occur when writing data in blocks of larger
than 32767 512-byte disk sectors. In terms of speed, write speed seems as
expected until one gets to this 32767 limit and then the speed falls off as if
all data beyond this is processed byte-by-byte. I can't prove this is what is
happening -- but speed tests generally support this theory. These write speeds
are in the range of 18 to 25 MBytes per second for spinning disks and about 50
Mbytes/sec for SSDs. Keep in mind these numbers should be more like 120
MBytes/sec for spinning disks and 300 MBytes/sec for SSDs.
This issue seems to be system specific. I originally saw this on my HP z640
workstation using Python 2.7 under Windows 7. Originally it was numpy writes
of large arrays in the 100GB size range that highlighted the issue, but I've
since written test code using just python "write" too and get similar results
using various block sizes. I've since verified this using cygwin mingw64 C and
with Visual Studio C 2013. I've also tested this on a variety of other
systems. My laptop does not show this speed issue, and not all z640 systems
seem to show this issue though I've found several that do. IT has tested this
on a clean Windows 7 image and on a Windows 10 image using yet another Z640
system and they get similar results. I've also not seen any Linux systems show
this issue though I don't have any Z640's with Linux on them. I have however
run my tests on Linux Mint 17 running under VirtualBox on the same Z640 that
showed the speed issue and using both Wine and native python and both
showed good performance and no slowdown.
A work around for this seems to be to enable full caching for the drive in
device manager with the subsequent risk of data corruption. This suggests for
example that the issue is byte-by-byte flushing of data beyond the 32767 limit
and that perhaps full cashing mitigates this some how. The other work around
is to write all data in blocks of less than the 32767 limit (which is about
16Mbytes) as mentioned above. Of course reducing block size only works if you
have the source code and the time and inclination to modify it. There is an
indication that some of the commercial code we use for science and engineering
also may suffer from this issue.
The impact of this issue also seems application specific. The issue only
becomes annoying when your regularity writing files of significant size (above
say 10GB). It also depends on how an application writes data, so not all
applications that create large files may exhibit this issue. As an example,
Python numpy tofile method has this issue for large enough arrays and is the
reason I started to investigate.
I don't really know where to go with this. Is this a Windows issue? Is it an
RTL issue? Is it a hardware, device driver, or bios issue? Is there a stated
OS or library limit to buffer sizes to things like C fwrite or Python write
which makes this an application issue? Thoughts?
Thanks,
remmm
--
https://mail.python.org/mailman/listinfo/python-list