Python and file locking - NFS or MySQL?

2005-08-29 Thread Christopher DeMarco
Hi all...

...I've got a Python script running on a bunch of boxen sharing some
common NFS-exported space.  I need (not want :) to lock files for
writing, and I need (not want :) to do it safely (i.e. atomically).
I'm doing this in Linux; NFS4 is available.  As I understand it, my
options are:

1.  Python's fcntl() is an interface to the fcntl(2) system call,
which is claimed to work "mostly" over NFS v >= 3.

2.  open(2) is atomic on a local FS, I've read discussions that imply
that link(2) is atomic over NFS (is it?), so I can link from local
lockfile to remote target atomically.  I don't grok this; open(2) +
link(2) + stat(2) == 3 calls on my fingers.  HTH is this supposed to
work?

3.  Atomically update a MySQL database indicating that the file is
locked - MySQL has atomic transactions now, right?  And how good is
the Python MySQL API - IIRC Perl didn't have atomic transactions last
year; will this work in contemporary Python?

I've got a while (several weeks) to chew this problem over (my current
implementation is ``assert("Poof!  File locked")'').

What are my options for safely locking files via NFS?  I don't want to
get involved with NLM as my impression is it's being buggy and
unwieldy.  Thanks in advance!


I was present at an undersea, unexplained mass sponge migration.
-- 
Christopher DeMarco <[EMAIL PROTECTED]>
Alephant Systems (http://alephant.net)
PGP public key at http://pgp.alephant.net
+1 412 708 9660


signature.asc
Description: Digital signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python-list Digest, Vol 23, Issue 415

2005-08-29 Thread Christopher DeMarco
On Mon, Aug 29, 2005 at 10:43:46PM +0200, [EMAIL PROTECTED] wrote:

> Date: Mon, 29 Aug 2005 16:32:59 -0400
> To: python-list@python.org
> From: Steve Holden <[EMAIL PROTECTED]>
> Subject: Re: NYC Opening



> >THEY ARE LOCATED IN NEW YORK, THIS IS FULL-TIME ONLY, WILL NOT CONSIDER
> >ANYONE FROM OUTSIDE THE US! THIS TEAM IS AN ELITE TEAM, YOU BETTER BE
> >GOOD

> It seems like this bank expects much more of its programmers than it 
> does of its recruitment consultants ...

They've outsourced most of the work to Nigeria...

PY7HON P.R.O.G.R.A.M.M.E.R.S DELIVERED DISCRETELY TO YOUR DOORSTEP


-- 
Christopher DeMarco <[EMAIL PROTECTED]>
Alephant Systems (http://alephant.net)
PGP public key at http://pgp.alephant.net
+1 412 708 9660
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and file locking - NFS or MySQL?

2005-09-13 Thread Christopher DeMarco
Fredrik Lundh wrote:

>os.link(tempfile, lockfile) # atomic!

Fredrik, thanks for replying - I monitored python-list but didn't see
anything.  Gotta get a proper Usenet feed...

Are you sure os.link() will be atomic over NFS?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and file locking - NFS or MySQL?

2005-09-13 Thread Christopher DeMarco
Thanks for the reply; I somehow missed this entire thread in
python-list.  I'm going to give it a whirl, after digging a bit into
the status quo of Linux' NFSv3 implementation.

-- 
http://mail.python.org/mailman/listinfo/python-list


Tuning a select() loop for os.popen3()

2005-12-30 Thread Christopher DeMarco
Hi all... 

I've written a class to provide an interface to popen; I've included
the actual select() loop below.  I'm finding that "sometimes" popen'd
processes take "a really long time" to complete and "other times" I
get incomplete stdout.

E.g:

  - on boxA ffmpeg returns in ~25s; on boxB (comparable hardware,
  identical OS) ~5m.

  - ``ls'' on a directory with 15 nodes returns full stdout; ``ls -R''
  on that same directory (with ~32K nodes beneath) stops after
  4097KB of output.

The code in question is running on Linux 2.6.x; no cross-platform
portability desired.  popen'd commands will never be interactive; I
just wanna read stdin/stdout and perhaps feed a one-shot string via
stdin.

Here's the relevent code (stripped of comments and various OO
setup/output stuff):


# # ## ### #  # #
# cut here

  def run(self):
import os, select, syslog
(_stdin, _stdout, _stderr) = os.popen3(self.command)

stdoutChunks = []; stderrChunks = []
readList = [_stdout, _stderr];
if self.stdinString is not "": writeList = [_stdin]
else: writeList = []
readStderr = False; readStdout = False

i = 0
while True:
  i += 1
  (r, w, x) = select.select(readList, writeList, [], 1)
  read = ""

  if self.stdinString is not "":
if w:
  bytesWritten = os.write(_stdin.fileno(), self.stdinString)
  writeList.remove(_stdin)
  _stdin.close()
  continue

  if r:
if _stderr in r:
  readStderr = True
  read = os.read(_stderr.fileno(), 16384)
  if read: stderrChunks.append(read)
  else:readList.remove(_stderr)
  continue

elif _stdout in r:
  readStdout = True
  read = os.read(_stdout.fileno(), 16384)
  if read:
stdoutChunks.append(read)
syslog.syslog("Command instance read %d from stdout" % len(read))
  else:readList.remove(_stdout)
  continue

  else:
if \
   (readStderr and self.dieOnStderr) \
   or \
   readStdout:
  syslog.syslog("Command instance finished")
  break
return

# cut here
# # ## ### #  # #


Tweaking (a) the os.read() buffer size and (b) the select() timeout
and testing with ``ls -R'' on a directory with ~ 32K nodes beneath, I
find the following trends:

1.  With a very small os.read() buffer, I get full stdout, but running
time is rather long.  Running time increases as select() timeout
increases.

2.  With a very large os.read() buffer, I get incomplete stdout (but
running time is *very* fast).  As select() timeout increases, I get
better and better results - with a select() timeout of 0.2 I seem to
get reliably full stdout.


The values used in the code I've pasted above - large buffer, large
select() timeout - seem to perform "well enough"; none of the
previously described problems manifest.  However, ``ls -lR /'' (way
more than 32K nodes) "sometimes" gives incomplete stdout.


My first question, then, is paranoid: I've run all these benchmarks
because the application using this code saw a HUGE performance hit
when we started using popen'd commands which generated "lots of"
output.

Is there anything wrong with the logic in my code?!

Will I see severe performance degradation (or worse, incomplete
stdout/stderr) as system variables change (e.g. system load increases,
popen'd program changes, popen'd program increases workload, etc.)?


Next question - how do I tune the select() timeout and the os.read()
buffer correctly?  Is it *really* per- command, per- system, per-
phase-of-moon voodoo?  Is there a Reccommended Setup for such a
select() loop?


Thanks in advance, for insight as well as for tolerating my
long-windedness...


-- 
Christopher DeMarco <[EMAIL PROTECTED]>
Alephant Systems (http://alephant.net)
PGP public key at http://pgp.alephant.net
+1-412-708-9660


signature.asc
Description: Digital signature
-- 
http://mail.python.org/mailman/listinfo/python-list