I'm seeing slow write speeds from both Python and C code on some Windows 
workstations.  In particular both Python "write" and numpy "tofile" method 
suffers from this issue.  I'm wondering if anyone has any ideas regarding if 
this is a known issue, know the cause, or how to resolve the issue?  The 
details are below.

The slow write speed issue seems to occur when writing data in blocks of larger 
than 32767 512-byte disk sectors.  In terms of speed, write speed seems as 
expected until one gets to this 32767 limit and then the speed falls off as if 
all data beyond this is processed byte-by-byte.  I can't prove this is what is 
happening -- but speed tests generally support this theory.  These write speeds 
are in the range of 18 to 25 MBytes per second for spinning disks and about 50 
Mbytes/sec for SSDs.  Keep in mind these numbers should be more like 120 
MBytes/sec for spinning disks and 300 MBytes/sec for SSDs.

This issue seems to be system specific.  I originally saw this on my HP z640 
workstation using Python 2.7 under Windows 7.  Originally it was numpy writes 
of large arrays in the 100GB size range that highlighted the issue, but I've 
since written test code using just python "write" too and get similar results 
using various block sizes.  I've since verified this using cygwin mingw64 C and 
with Visual Studio C 2013.  I've also tested this on a variety of other 
systems.  My laptop does not show this speed issue, and not all z640 systems 
seem to show this issue though I've found several that do. IT has tested this 
on a clean Windows 7 image and on a Windows 10 image using yet another Z640 
system and they get similar results.  I've also not seen any Linux systems show 
this issue though I don't have any Z640's with Linux on them.  I have however 
run my tests on Linux Mint 17 running under VirtualBox on the same Z640 that 
showed the speed issue and using both Wine and native python and both 
 showed good performance and no slowdown.

A work around for this seems to be to enable full caching for the drive in 
device manager with the subsequent risk of data corruption.  This suggests for 
example that the issue is byte-by-byte flushing of data beyond the 32767 limit 
and that perhaps full cashing mitigates this some how.  The other work around 
is to write all data in blocks of less than the 32767 limit (which is about 
16Mbytes) as mentioned above. Of course reducing block size only works if you 
have the source code and the time and inclination to modify it.  There is an 
indication that some of the commercial code we use for science and engineering 
also may suffer from this issue.  

The impact of this issue also seems application specific.  The issue only 
becomes annoying when your regularity writing files of significant size (above 
say 10GB).  It also depends on how an application writes data, so not all 
applications that create large files may exhibit this issue.  As an example, 
Python numpy tofile method has this issue for large enough arrays and is the 
reason I started to investigate.

I don't really know where to go with this.  Is this a Windows issue?  Is it an 
RTL issue?  Is it a hardware, device driver, or bios issue?  Is there a stated 
OS or library limit to buffer sizes to things like C fwrite or Python write 
which makes this an application issue? Thoughts?

Thanks,
remmm
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to