Re: How to safely maintain a status file

2012-07-14 Thread Christian Heimes
Am 13.07.2012 03:52, schrieb Steven D'Aprano: > And some storage devices (e.g. hard drives, USB sticks) don't actually > write data permanently even when you sync the device. They just write to > a temporary cache, then report that they are done (liar liar pants on > fire). Only when the cache i

Re: How to safely maintain a status file

2012-07-13 Thread Steven D'Aprano
On Fri, 13 Jul 2012 15:15:13 -0500, Chris Gonnerman wrote: > On 07/13/2012 12:59 PM, Prasad, Ramit wrote: >> I lean slightly towards the POSIX handling with the addition that any >> additional write should throw an error. You are now saving to a file >> that will not exist the moment you close it

Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Christian Heimes
Am 13.07.2012 21:57, schrieb MRAB: > It's possible to create a temporary file even in Windows. Windows has a open() flag named O_TEMPORARY for temporary files. With O_TEMPORARY the file is removed from disk as soon as the file handle is closed. On POSIX OS it's common practice to unlink temporary

RE: How to safely maintain a status file

2012-07-13 Thread Chris Gonnerman
On 07/13/2012 12:59 PM, Prasad, Ramit wrote: I lean slightly towards the POSIX handling with the addition that any additional write should throw an error. You are now saving to a file that will not exist the moment you close it and that is probably not expected. Ramit But if I created, then del

Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread MRAB
On 13/07/2012 19:28, Hans Mulder wrote: On 13/07/12 19:59:59, Prasad, Ramit wrote: I lean slightly towards the POSIX handling with the addition that any additional write should throw an error. You are now saving to a file that will not exist the moment you close it and that is probably not expe

Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Hans Mulder
On 13/07/12 19:59:59, Prasad, Ramit wrote: > I lean slightly towards the POSIX handling with the addition that > any additional write should throw an error. You are now saving to > a file that will not exist the moment you close it and that is > probably not expected. I'd say: it depends. If t

Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Chris Angelico
On Sat, Jul 14, 2012 at 3:59 AM, Prasad, Ramit wrote: > I lean slightly towards the POSIX handling with the addition that > any additional write should throw an error. You are now saving to > a file that will not exist the moment you close it and that is probably > not expected. There are several

RE: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Prasad, Ramit
> >> Well "neat tricks" aside, I am of the firm belief that deleting files > should > >> never be possible whilst they are open. > > This is one of the few instances I think Windows does something better > > than OS X. Windows will check before you attempt to delete (i.e. move > > to Recycling Bin)

Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Chris Gonnerman
On 07/13/2012 11:00 AM, Prasad, Ramit wrote: Well "neat tricks" aside, I am of the firm belief that deleting files should never be possible whilst they are open. This is one of the few instances I think Windows does something better than OS X. Windows will check before you attempt to delete (i.e

RE: How to safely maintain a status file

2012-07-13 Thread Prasad, Ramit
> Well "neat tricks" aside, I am of the firm belief that deleting files should > never be possible whilst they are open. This is one of the few instances I think Windows does something better than OS X. Windows will check before you attempt to delete (i.e. move to Recycling Bin) while OS X will m

Re: How to safely maintain a status file

2012-07-13 Thread Steven D'Aprano
On Thu, 12 Jul 2012 21:26:20 -0700, rantingrickjohnson wrote: > On Thursday, July 12, 2012 10:13:47 PM UTC-5, Steven D'Aprano wrote: >> Rick has obviously never tried to open a file for reading when somebody >> else has it opened, also for reading, and discovered that despite >> Windows being alle

Re: How to safely maintain a status file

2012-07-12 Thread Chris Angelico
On Fri, Jul 13, 2012 at 2:26 PM, wrote: > On Thursday, July 12, 2012 10:13:47 PM UTC-5, Steven D'Aprano wrote: >> Rick has obviously never tried to open a file for reading when somebody >> else has it opened, also for reading, and discovered that despite Windows >> being allegedly a multi-user op

Re: How to safely maintain a status file

2012-07-12 Thread rantingrickjohnson
On Thursday, July 12, 2012 10:13:47 PM UTC-5, Steven D'Aprano wrote: > Rick has obviously never tried to open a file for reading when somebody > else has it opened, also for reading, and discovered that despite Windows > being allegedly a multi-user operating system, you can't actually have > mu

Re: How to safely maintain a status file

2012-07-12 Thread Steven D'Aprano
On Thu, 12 Jul 2012 23:49:02 -0400, Gene Heskett wrote: > When I wanted to impress the visiting frogs, I often did something I > have never been able to do on any other operating system since, start > assembling a long assembly language file on one of the screens on the > color monitor, hit the cl

Re: How to safely maintain a status file

2012-07-12 Thread Gene Heskett
On Thursday 12 July 2012 23:21:16 Steven D'Aprano did opine: > On Fri, 13 Jul 2012 12:12:01 +1000, Chris Angelico wrote: > > On Fri, Jul 13, 2012 at 11:20 AM, Rick Johnson > > > > wrote: > >> On Jul 12, 2:39 pm, Christian Heimes wrote: > >>> Windows's file system layer is not POSIX compatible.

Re: How to safely maintain a status file

2012-07-12 Thread Steven D'Aprano
On Fri, 13 Jul 2012 12:12:01 +1000, Chris Angelico wrote: > On Fri, Jul 13, 2012 at 11:20 AM, Rick Johnson > wrote: >> On Jul 12, 2:39 pm, Christian Heimes wrote: >>> Windows's file system layer is not POSIX compatible. For example you >>> can't remove or replace a file while it is opened by a p

Re: How to safely maintain a status file

2012-07-12 Thread Chris Angelico
On Fri, Jul 13, 2012 at 11:20 AM, Rick Johnson wrote: > On Jul 12, 2:39 pm, Christian Heimes wrote: >> Windows's file system layer is not POSIX compatible. For example >> you can't remove or replace a file while it is opened by a process. > > Sounds like a reasonable fail-safe to me. POSIX says

Re: How to safely maintain a status file

2012-07-12 Thread Steven D'Aprano
On Thu, 12 Jul 2012 15:05:26 +0200, Christian Heimes wrote: > You need to flush the data to disk as well as the metadata of the file > and its directory in order to survive a system crash. The close() > syscall already makes sure that all data is flushed into the IO layer of > the operating system

Re: How to safely maintain a status file

2012-07-12 Thread Rick Johnson
On Jul 12, 2:39 pm, Christian Heimes wrote: > Windows's file system layer is not POSIX compatible. For example > you can't remove or replace a file while it is opened by a process. Sounds like a reasonable fail-safe to me. Not much unlike a car ignition that will not allow starting the engine if

Re: How to safely maintain a status file

2012-07-12 Thread Christian Heimes
Am 12.07.2012 19:43, schrieb Laszlo Nagy: > Well, I didn't know that this is going to work. At least it does not > work on Windows 7 (which should be POSIX compatible?) Nope, Windows's file system layer is not POSIX compatible. For example you can't remove or replace a file while it is opened by a

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy
Windows doesn't suppport atomic renames if the right side exists. I suggest that you implement two code paths: if os.name == "posix": rename = os.rename else: def rename(a, b): try: os.rename(a, b) except OSError, e: if e.errno != 183:

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy
This is not a contradiction. Although the rename operation is atomic, the whole "change status" process is not. It is because there are two operations: #1 delete old status file and #2. rename the new status file. And because there are two operations, there is still a race condition. I see no co

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy
Sorry, but you are wrong. It's just one operation that boils down to "point name to a different inode". After the rename op the file name either points to a different inode or still to the old name in case of an error. The OS guarantees that all processes either see the first or second state (in

Re: How to safely maintain a status file

2012-07-12 Thread Ross Ridge
Laszlo Nagy: > This is not a contradiction. Although the rename operation is atomic, > the whole "change status" process is not. It is because there are two > operations: #1 delete old status file and #2. rename the new status > file. And because there are two operations, there is still a race > co

Re: How to safely maintain a status file

2012-07-12 Thread Hans Mulder
On 12/07/12 14:30:41, Laszlo Nagy wrote: >> You are contradicting yourself. Either the OS is providing a fully >> atomic rename or it doesn't. All POSIX compatible OS provide an atomic >> rename functionality that renames the file atomically or fails without >> loosing the target side. On POSIX OS

Re: How to safely maintain a status file

2012-07-12 Thread Christian Heimes
Am 12.07.2012 14:30, schrieb Laszlo Nagy: > This is not a contradiction. Although the rename operation is atomic, > the whole "change status" process is not. It is because there are two > operations: #1 delete old status file and #2. rename the new status > file. And because there are two operation

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy
Renaming files is the wrong way to synchronize a crawler. Use a database that has ACID properties, such as SQLite. Far fewer I/O operations are required for small updates. It's not the 1980s any more. I agree with this approach. However, the OP specifically asked about "how to update stat

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy
You are contradicting yourself. Either the OS is providing a fully atomic rename or it doesn't. All POSIX compatible OS provide an atomic rename functionality that renames the file atomically or fails without loosing the target side. On POSIX OS it doesn't matter if the target exists. This is no

Re: How to safely maintain a status file

2012-07-09 Thread alex23
On Jul 10, 6:24 am, John Nagle wrote: > That's because you're using the wrong approach. See how to use > ReplaceFile under Win32: > > http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx I'm not convinced ReplaceFile is atomic: "The ReplaceFile function combines several steps within

Re: How to safely maintain a status file

2012-07-09 Thread Christian Heimes
Am 09.07.2012 22:24, schrieb John Nagle: > Rename on some file system types (particularly NFS) may not be atomic. The actual operation is always atomic but the NFS server may not notify you about success or failure atomically. See http://linux.die.net/man/2/rename, section BUGS. > That's b

Re: How to safely maintain a status file

2012-07-09 Thread Dan Stromberg
On Mon, Jul 9, 2012 at 8:24 PM, John Nagle wrote: > On 7/8/2012 2:52 PM, Christian Heimes wrote: > >> You are contradicting yourself. Either the OS is providing a fully >> atomic rename or it doesn't. All POSIX compatible OS provide an atomic >> rename functionality that renames the file atomical

Re: How to safely maintain a status file

2012-07-09 Thread Michael Hrivnak
Please consider batching this data and doing larger writes. Thrashing the hard drive is not a good plan for performance or hardware longevity. For example, crawl an entire FQDN and then write out the results in one operation. If your job fails in the middle and you have to start that FQDN over,

Re: How to safely maintain a status file

2012-07-09 Thread John Nagle
On 7/8/2012 2:52 PM, Christian Heimes wrote: You are contradicting yourself. Either the OS is providing a fully atomic rename or it doesn't. All POSIX compatible OS provide an atomic rename functionality that renames the file atomically or fails without loosing the target side. On POSIX OS it doe

Re: How to safely maintain a status file

2012-07-09 Thread Duncan Booth
Richard Baron Penman wrote: > Is there a better way? Or do I need to use a database? Using a database would seem to meet a lot of your needs. Don't forget that Python comes with a sqlite database engine included, so it shouldn't take you more than a few lines of code to open the database once

Re: How to safely maintain a status file

2012-07-09 Thread Nobody
On Sun, 08 Jul 2012 22:57:56 +0200, Laszlo Nagy wrote: > Yes, this is much better. Almost perfect. Don't forget to consult your > system documentation, and check if the rename operation is atomic or not. > (Most probably it will only be atomic if the original and the renamed file > are on the same

Re: How to safely maintain a status file

2012-07-09 Thread Christian Heimes
Am 09.07.2012 07:50, schrieb Plumo: >> Windows doesn't suppport atomic renames if the right side exists. I >> suggest that you implement two code paths: > > Problem is if the process is stopped between unlink and rename there > would no status file. Yeah, you have to suffer all of Windows' design

Re: How to safely maintain a status file

2012-07-08 Thread Plumo
> Windows doesn't suppport atomic renames if the right side exists.  I > suggest that you implement two code paths: > > if os.name == "posix": >     rename = os.rename > else: >     def rename(a, b): >         try: >             os.rename(a, b) >         except OSError, e: >             if e.errno

Re: How to safely maintain a status file

2012-07-08 Thread Plumo
> > and then on startup read from tmp_file if status_file does not exist. > > But this seems awkward. > >         It also violates your requirement -- since the "crash" could take > place with a partial "temp file". Can you explain why? My thinking was if crash took place when writing the temp fil

Re: How to safely maintain a status file

2012-07-08 Thread Plumo
> What are you keeping in this status file that needs to be saved > several times per second?  Depending on what type of state you're > storing and how persistent it needs to be, there may be a better way > to store it. > > Michael This is for a threaded web crawler. I want to cache what URL's are

Re: How to safely maintain a status file

2012-07-08 Thread Christian Heimes
Am 08.07.2012 22:57, schrieb Laszlo Nagy: > But even if the rename operation is atomic, there is still a race > condition. Your program can be terminated after the original status file > has been deleted, and before the temp file was renamed. In this case, > you will be missing the status file (alt

Re: How to safely maintain a status file

2012-07-08 Thread Laszlo Nagy
On Sun, 8 Jul 2012 21:29:41 +1000, Richard Baron Penman declaimed the following in gmane.comp.python.general: and then on startup read from tmp_file if status_file does not exist. But this seems awkward. It also violates your requirement -- since the "crash" could take place with a par

Re: How to safely maintain a status file

2012-07-08 Thread Michael Hrivnak
What are you keeping in this status file that needs to be saved several times per second? Depending on what type of state you're storing and how persistent it needs to be, there may be a better way to store it. Michael On Sun, Jul 8, 2012 at 7:53 AM, Christian Heimes wrote: > Am 08.07.2012 13:2

Re: How to safely maintain a status file

2012-07-08 Thread Christian Heimes
Am 08.07.2012 13:29, schrieb Richard Baron Penman: > My initial solution was a thread that writes status to a tmp file > first and then renames: > > open(tmp_file, 'w').write(status) > os.rename(tmp_file, status_file) You algorithm may not write and flush all data to disk. You need to do addition

How to safely maintain a status file

2012-07-08 Thread Richard Baron Penman
Hello, I want my script to generate a ~1KB status file several times a second. The script may be terminated at any time but the status file must not be corrupted. When the script is started next time the status file will be read to check what needs to be done. My initial solution was a thread tha