Bugs item #1478529, was opened at 2006-04-28 12:46 Message generated for change (Comment added) made by tim_one You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1478529&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Windows Group: 3rd Party Status: Closed Resolution: Wont Fix Priority: 5 Submitted By: Mark Sheppard (markshep) Assigned to: Nobody/Anonymous (nobody) Summary: size limit exceeded for read() from network drive Initial Comment: If you've got a network share mounted as a local drive then Windows has a limit of 67,076,095 (0x03ff7fff) bytes for a read from an open file on that drive. Running the python read() method on an open file larger than this size throws an "IOError: [Errno 22] Invalid argument" exception. A fix would be for python to internally use multiple reads so as to not exceed this limit. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2006-05-04 02:18 Message: Logged In: YES user_id=31435 markshep: As you discovered, closing the report doesn't stop you from following up. It just reflects the reality that I don't consider this to be a Python bug, and am opposed to trying to worm around it inside Python. Like many people who have just been burned by a platform quirk, I think you're over-selling the severity of the problem while ignoring the costs of worming around it. Adding piles of Windows-specific code to what's _currently_ a simple and uniform implementation is an ongoing maintenance headache, not least because that code will stick around long after the next version of Windows has removed the cause for it. In the meantime it complicates the code with obscure platform-specific hacks, reducing the reliability of the code because it also reduces the code's clarity. The code can't be sanely tested by Python's standard test suite either (it apparently requires a Windows network to provoke, and the test suite assumes no such thing), and untested hack-code is a horrible idea over time. While it's true that the docs allow for multiple reads under the covers, it's talking about cases like file objects returned by a popen() call or a socket makefile() call when read() is passed a `size` argument, or when read() is called with no `size` argument (so it's impossible to know in advance how large a buffer may be needed to reach EOF). The entire reading code for an explicitly-sized read on a genuine file is a single return fread(buf, 1, n, stream); call today, and on all platforms. It doesn't look like this can end with reading either: MS documents a similar problem with writing, and I expect you want to see that hacked around too (or, if not, you're pretty selective ;-)). Pain spreads. In return, what's the benefit? The fact that it _is_ so hard to find anything via Google about this strongly suggests to me that trying to read more than 64MB in one gulp across a vulnerable Windows combo is mighty rare. If it happens, the failure isn't silent, an explicit exception is raised letting the programmer know it didn't work. While I appreciate that's irritating, it's not a disaster, and a programmer who cares can worm around it on their own ("so don't do that -- read < 64MB per gulp yourself"). Obviously, I'm not going to pursue this. Since I'm one of the few people who "does" Windows code for the core, that does cut the chance that anyone will. If you want to pursue it, the best chance is to supply a patch implementing it, and get someone else to review it. A stronger case could be made if, e.g., there was evidence that Perl or PHP or Ruby or VB or C# or ... intend to worm (or have wormed) around it. ---------------------------------------------------------------------- Comment By: Mark Sheppard (markshep) Date: 2006-05-03 06:38 Message: Logged In: YES user_id=1512331 Thanks for closing this bug without giving me a chance to follow up! The problem isn't caused by a limitation of my machine - it's got 3 GiB of RAM. I've done some more testing on this and the problem only appears when connected to a server running certain SMB implementations: The local Windows XP machine A remote Windows XP machine Samba 3.0.22 on Linux When connected to servers running the following SMB implementations the problem isn't present: Windows NT 4.0 Server Windows Server 2000 Windows Server 2003 Standard Edition As this error is being returned by the underlying fread() call the proper place for it to be fixed is there, but the chances of Microsoft doing so for Windows XP are negligible. As you're trying to provide a cross-platform language then having to put up with OS's undocumented warts is just part of the job. As it's entirely possible for you to implement a work-around for this problem Python I think you should. One of reasons for using a high level language like Python is to be insulated from system quirks likes this. If you're refusing to smooth over these quirks where possible then you're undermining that reason. The documentation for Python's read() method on a file handle already says "Note that this method may call the underlying C function fread() more than once", so this possibility is already catered for in the documentation. As this problem only affects remotely mounted filesystems the workaround need only be used for such filesystems. You can determine whether or not a drive is a network one by using the GetDriveType() Windows call. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-05-02 21:45 Message: Logged In: YES user_id=31435 Sorry, I'm closing this as "3rd Party, Won't Fix". It's certainly not Python's doing that Microsoft's stdio implementation has peculiar undocumented warts (Python just calls the platform C's fread() here), so at best this is a request for enhancement rather than a Python bug. If there is a bug here, it's Microsoft's bug, and then the proper source for a fix is also Microsoft. This is especially true since the two people who have tried this here don't see the same behavior -- we don't even know what "the bug" is. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-05-02 15:00 Message: Logged In: YES user_id=21627 I could reproduce the write problem on XPSP2; I get the Win32 error ERROR_NO_SYSTEM_RESOURCES after fwrite returns (from GetLastError). I can't reproduce the fread problem, though: in Python, f.read(90*2**20) just returns with a 90MiB string. So it could be a limitation of your machine (e.g. it might not have enough memory), or of the server machine. I'm hesitant to add a work-around for that into Python if this isn't a system limitation. Performing multiple reads is also bad: what happens if the first read succeeds, and the second one fails? It might be that the system *really* is out of resources. ---------------------------------------------------------------------- Comment By: Mark Sheppard (markshep) Date: 2006-05-02 06:48 Message: Logged In: YES user_id=1512331 I'm running Windows XP. I've been unable to find any documentation about this exact problem - only that fwrite thing. But my testing shows that it works if I do file.read(67076095), but throws an exception with file.read(67076096). I'm not suggesting limiting all reads from Python. All I'm suggesting is that under the hood the Windows implementation of Python's read() call actually uses multiple fread() (or whatever) calls if more than 67076095 bytes need to be read. That's all. No interface changes. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-04-30 12:23 Message: Logged In: YES user_id=31435 Martin, here's an MS article seemingly related to this: http://support.microsoft.com/default.aspx?scid=kb;en-us;899149 However, it's about writing to a file on a network drive, not reading from it. It says that opening the file in 'w+b' mode, instead of 'wb' mode, is a workaround. I couldn't find anything documenting the same kind of problem for reading. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-04-30 06:10 Message: Logged In: YES user_id=21627 What version of Windows are you using? Do you know of any documentation of this limit? (without actually testing, I find it hard to believe that this limit exists in Windows) ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-04-29 09:23 Message: Logged In: YES user_id=849994 How can it be determined whether exactly this restriction caused the "invalid argument" error? If it can't, there's nothing that can be done -- restricting all reads just because of a Windows limitation doesn't seem right. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1478529&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com