[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-13 Thread Javen Wang

New submission from Javen Wang :

I encountered a very strange issue in file flush operation in Windows.
Here's the scenario of my application:

1. The main thread of my application will create makefiles sequentially.
2. Once a makefile is generated, launch a separate process calling
nmake.exe to run it in parallell. The main thread then create another
makefile until no more makefiles to create.
3. The number of new processes is limited by command line options.

My application has been running for almost a year without any problem.
But, after I made some changes recently to the content of makefile
generated, "nmake.exe" in separate process sometimes complains that
the makefile was not found. But when I went into the directory, the
makefile was there.

Because I didn't change any thing related to file creation and the new
makefiles are a little bit less than before, I guessed that the
makefile just created hasn't been flushed to disk because of size
change so the new process could not see it in a short time.

So I decided add code to force flush file buffer after writing the
file content (I didn't flush the file explicitly before). I did it
like below:

Fd = open(File, "w")
Fd.write(Content)
Fd.flush()
os.fsync(Fd.fileno())
Fd.close()

The strangest thing happened. The "makefile" missing happened more
frequently than no flush operation. I searched the web but no answer
there.

Finally I tried to use Windows file API to create the file via pywin32
extension. The problem's gone.

import win32file
Fd = win32file.CreateFile(
File,
win32file.GENERIC_WRITE,
0,
None,
win32file.CREATE_ALWAYS,
win32file.FILE_ATTRIBUTE_NORMAL,
None
)
win32file.WriteFile(Fd, str(Content), None)
win32file.FlushFileBuffers(Fd)
win32file.CloseHandle(Fd)

I tried writing small python extension in C to make use Windows API to
create file like above. It also works well, even I removed the
FlushFileBuffers() calling.

I think that there's a bug in Python file buffer mechanism.

--
components: Windows
messages: 79829
nosy: javen72
severity: normal
status: open
title: os.fsync() doesn't work as expect in Windows
type: behavior
versions: Python 2.5

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-14 Thread Javen Wang

Javen Wang  added the comment:

gagenellina,

My application is a little bit different from your test code. It won't
wait for the exit of new process and there're file writing operations
during makefile running. I changed the test code to be real
multi-process and tried many file sizes. But I cannot reproduce it.
Maybe my application is more complicated situation.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-14 Thread Javen Wang

Javen Wang  added the comment:

The created files are on local drive. I saw the problem on the laptop
(XP-SP2), desktop(XP-SP3) and server (Win2003). But there's no such
problem on the Linux and Mac boxes.

I tried to use fopen/fwrite in my extension according to your
suggestion. The problem wasn't be reproduced. It seems the bug is more
alike in Python part.

My application is a build system and is also an open source project. Is
it possible for you to download it and try it in your box?

I created temporary user (py123, password: 123py123) for you (just in
case) and here's steps of how to reproduce it.

  1. Checkout the build system source code in, for example, c:\test
 C:\test> svn co --username py123 --password 123py123
https://buildtools.tianocore.org/svn/buildtools/trunk/BaseTools tools

  2. Checkout the source code to be built against in c:\test
 C:\test> svn co --username py123 --password 123py123
https://edk2.tianocore.org/svn/edk2/trunk/edk2 edk2

  3. Change the source code between line 222 and line 229 of
c:\test\tools\Source\Python\Common\Misc.py (SaveFileOnChange function)
like below:

 Fd = open(File, "wb")
 Fd.write(Content)
 Fd.flush()
 os.fsync(Fd.fileno())
 Fd.close()

  4. In c:\test\edk2, run
 C:\test\edk2> edksetup.bat
 C:\test\edk2> set PYTHONPATH=C:\test\tools\Source\Python
 C:\test\edk2> python.exe C:\test\tools\Source\Python\build\build.py
-n 2 -p MdeModulePkg\MdeModulePkg.dsc -a IA32 -s

  5. If the application stops with message like "makefile not found" or
"AutoGen.h not found" message, that means the problem happened.

Visual Studio 2005 is needed to reproduce it.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-14 Thread Javen Wang

Javen Wang  added the comment:

There would be more chances to see the problem by doing this:

C:\test\edk2> python.exe C:\test\tools\Source\Python\build\build.py
-n 2 -p IntelFrameworkModulePkg\IntelFrameworkModulePkg.dsc -a IA32 -s

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-14 Thread Javen Wang

Javen Wang  added the comment:

Thank you very much for the trying. You might miss the step 4 in my
previous message. The step is:

  C:\test\edk2> edksetup.bat
  C:\test\edk2> set PYTHONPATH=C:\test\tools\Source\Python
  C:\test\edk2> python.exe C:\test\tools\Source\Python\build\build.py
-n 2 -p IntelFrameworkModulePkg\IntelFrameworkModulePkg.dsc -a IA32 -s

The Visual Studio 2005 must be in the standard installation directory.
Otherwise the C:\test\edk2\Conf\tools_def.txt needs to be changed to
reflect the real path.

And I tried to disabled all antivirus services and the problem is still
there. 

Don't worry about the binary version of build in
edk2\BaseTools\Bin\Win32 (linked against Python 2.5.2). The step I told
you is to execute my application from Python script directly. And I
tried to execute from script source against Python 2.5.4 and the problem
is the same. And no matter running the build from script source or the
freeze-ed binary, the results are the same either.

If it's hard or inconvenient for you to reproduce it, could you please
give me any advice or suggestion on how to debug it (in the interpreter)
and where's most possible place the root cause would be in the Python
interpreter's code? I can try to change something in the Python
interpreter's code, rebuild it and try it on my machine. Although I have
work around for this problem, I'd like to root cause it to avoid further
possible build break of our project.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-14 Thread Javen Wang

Javen Wang  added the comment:

I agree multithread programming is kind of difficult and tricky. But I
don't think there's a race condition in _MultiThreadBuildPlatform
method, because I do have only one producer. And the consumer consumes
the product only when it's done (i.e. the file is created and closed).
The only race condition, I think, it's in the Python or Windows file
system, in which the file created might not be seen by other process due
to the file buffer mechanism. I think the flush() method of file object
and os.fsync() are to remove the race condition. But it seems that they
are not working as they're supposed to be.

What I know is that os.fsync() calls the _commit() which calls
FlushFileBuffers(). Why no problem if I call the FlushFileBuffers()
directly? That's why I think the most possible race condition is in
Python file buffer operation which is out of the control of my Python code.

I'm sorry that I didn't realize there's 150M code to checkout. Thanks
for your patience. Actually they are not code of my application itself.
They are the code used to test my application because my application is
a build system which needs source code to build. The real code of my
application is in the, for my example, directory of
C:\test\tools\Source\Python with just about 3M source files :-) And I
think I have narrowed down the issue in the file creation in
SaveFileOnChange function in C:\test\tools\Source\Python\Common\Misc.py. 

I know it's very hard to reproduce issue in multi-thread context. And I
cannot say absolutely there's no bug in my code. It's OK for you to
close this tracker. But it would be better to let it open for a few days
so that I can do more investigation then. Anyway, thanks for the trying
again.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-16 Thread Javen Wang

Javen Wang  added the comment:

I narrowed down the root cause in the GIL of Python. I read the source
code of implementing os.fsync and found it's using
Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS to enclose the calling
of _commit() in order to be thread safe. I tried to add them in my C
extension and then the issue was reproduced. 

It looks like the thread state operation or GIL caused a race condition
and let os.fsync() not flush the makefile in buffer to disk before it
returns.

The multi-thread scenario in my application is 

a. the main thread produces makefile
b. once a makefile is generated, the main thread launches a new
thread in which a separated process is launched to run the makefile
until it exits.

I think there's no issue in this multi-thread scenario. And the Python
2.5.2 Maunal, Python/C API Ch8.1, says GIL must be taken care when doing
some blocking I/O operations, but it caused thread unsafe. So I still
think there's a bug in the Python.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-19 Thread Javen Wang

Javen Wang  added the comment:

I'm quite certain that the makefile is generated before the "make"
program is launched in separated process. Follow is the original code
where the makefile is created (line 14) and a task is put in queue (line
19). It's executed in the main thread (no parallel execution here).
There's no way that the process is launched before the makefile generation. 
 
01for Module in Pa.Platform.Modules:
02Ma = ModuleAutoGen(Wa, Module, BuildTarget, ToolChain, Arch,
self.PlatformFile)
03if Ma == None:
04continue
05# generate AutoGen files and Makefile here
06if self.Target not in ['clean', 'cleanlib', 'cleanall', 'run',
'fds']:
07# for target which must generate AutoGen code and makefile
08if not self.SkipAutoGen or self.Target == 'genc':
09Ma.CreateCodeFile(True)
10if self.Target == "genc":
11continue
12   
13if not self.SkipAutoGen or self.Target == 'genmake':
14Ma.CreateMakeFile(True)
15if self.Target == "genmake":
16continue
17
18# Generate build task for the module which will be launched
separately
19Bt = BuildTask.New(ModuleMakeUnit(Ma, self.Target))

I think the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS should not
be used for fsync in the Python core code (in posixmodule.c, right?).
Because fsync is mostly used to solve file operation issue in parallel
execution context, it should not give out CPU time to other threads or
processes when it's running.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-20 Thread Javen Wang

Javen Wang  added the comment:

I did trace the order of file creation and process launch. It shows the
file is indeed created before the process launch.

I did another investigation. I added a new process, "cmd.exe /c copy
Makefile NewMakefile", to copy the the Makefile created, just before
launching the "nmake.exe" process. The strangest thing happened again:
each makefile was copied successful but there's still "makefile not
found" reported by "nmake.exe" process. I tried standalone copy
application "xcopy.exe" and the result was the same. So I guess that
"cmd.exe", "xcopy.exe" and "nmake.exe" might use different families of
API or different mode (sync vs async) to access file.

I decided to try the code you provided. In checkfile.c, fopen is used to
test the file existence. I changed it to the Windows native API
CreateFile and I also added a file write operation in order to make it
more similar to the real case. Eventually, the problem was reproduced in
your code. Following are the successful number of creating 1000 files 5
times in different file sizes:

Create file in 403 bytes,
  985, 992, 984, 989, 992 (no flush after creation)
  883, 886, 907, 909, 915 (flush after creation)
  
Create file in 4061 bytes
  983, 976, 982, 977, 983 (no flush after creation)
  654, 672, 684, 686, 648 (flush after creation)
  
Create file in 16461 bytes:
  970, 967, 963, 963, 971 (no flush after creation)
  598, 664, 711, 653, 623 (flush after creation)

In summary:
  a. Using fopen to test a file existence in check_file.c will never
report failure, no matter buffer is flushed or not.
  b. Using CreateFile (read mode) to test a file existence in
check_file.c will always report failure. The bigger the file size will
cause more failure reported; the flush operation after file creation in
test_file_flush.py will cause more failure reported; the flush operation
after new file creation in check_file.c will cause more failure
reported; no flush operation in both check_file.c and test_file_flush.py
almost cause no failure.

I don't know what's root cause: GIL, Python thread state switch, Python
file buffer or Windows FileCreate API. I'm just certain there's race
condition between Python and Windows.

Added file: http://bugs.python.org/file12807/checkfile.c

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-20 Thread Javen Wang

Javen Wang  added the comment:

The test code and script which can reproduce the problem has been uploaded.

Added file: http://bugs.python.org/file12808/test_file_flush.py

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-21 Thread Javen Wang

Javen Wang  added the comment:

I'm using Python2.5 in which close_fds is not available in Windows. And
I cannot use Python2.6 because I need to redirect the stdout and stderr
and there's performance concern.

I have questions on the root cause:

  a. why doesn't fopen() has sharing issue?
  b. why don't os.close() and FileObject.close() really close the file?

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4944] os.fsync() doesn't work as expect in Windows

2009-01-22 Thread Javen Wang

Javen Wang  added the comment:

I don't have Python2.6 installed. I just have ever read a bench article
comparing the performance between different version of Python, including
2.5, 2.6 and 3.0. That article shows, for the same script, Python2.6 has
longer running time than Python2.5. My application is a build system and
the users are very care about the time spent in build. That's why I have
the performance concern.

What concerns me more is the stdout/stderr redirection. Python2.6 manual
says they cannot be redirected if close_fds is set to True. My
application relies on the I/O redirection to control the screen output
from subprocess.

Actually I have had a work-around for this issue. It works very well so
far. I reported a bug here just because I want the Python to be better.
I learned it one year ago but I love this language. I just hope nobody
else encounter such problem again. If you guys think it won't be fixed
in Python 2.5 or has been fixed in Python 2.6, it's OK to close this
tracker. Thanks for your time.

___
Python tracker 
<http://bugs.python.org/issue4944>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com