How to safely maintain a status file

2012-07-08 Thread Richard Baron Penman
Hello,

I want my script to generate a ~1KB status file several times a second.
The script may be terminated at any time but the status file must not
be corrupted.
When the script is started next time the status file will be read to
check what needs to be done.

My initial solution was a thread that writes status to a tmp file
first and then renames:

open(tmp_file, 'w').write(status)
os.rename(tmp_file, status_file)

This works well on Linux but Windows raises an error when status_file
already exists.
http://docs.python.org/library/os.html#os.rename


I guess I could delete the status file:

open(tmp_file, 'w').write(status)
if os.path.exists(status_file):
os.remove(status_file)
os.rename(tmp_file, status_file)

and then on startup read from tmp_file if status_file does not exist.
But this seems awkward.


Is there a better way? Or do I need to use a database?

Richard
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generate unique ID for URL

2012-11-13 Thread Richard Baron Penman
I found the MD5 and SHA hashes slow to calculate.
The builtin hash is fast but I was concerned about collisions. What
rate of collisions could I expect?

Outside attacks not an issue and multiple processes would be used.


On Wed, Nov 14, 2012 at 11:26 AM, Chris Kaynor  wrote:
> One option would be using a hash. Python's built-in hash, a 32-bit
> CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist,
> depending on the needs. Higher bit counts will reduce the odds of
> accidental collisions; cryptographically secure ones if outside
> attacks matter. In such a case, you'd have to roll your own means of
> converting the hash back into the string if you ever need it for
> debugging, and there is always the possibility of collisions. A
> similar solution would be using a pseudo-random GUID using the url as
> the seed.
>
> You could use a counter if all IDs are generated by a single process
> (and even in other cases with some work).
>
> If you want to be able to go both ways, using base64 encoding is
> probably your best bet, though you might get benefits by using
> compression.
> Chris
>
>
> On Tue, Nov 13, 2012 at 3:56 PM, Richard  wrote:
>> Good point - one way encoding would be fine.
>>
>> Also this is performed millions of times so ideally efficient.
>>
>>
>> On Wednesday, November 14, 2012 10:34:03 AM UTC+11, John Gordon wrote:
>>> In <0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com> Richard 
>>>  writes:
>>>
>>>
>>>
>>> > I want to create a URL-safe unique ID for URL's.
>>>
>>> > Currently I use:
>>>
>>> > url_id = base64.urlsafe_b64encode(url)
>>>
>>>
>>>
>>> > >>> base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
>>>
>>> > 'ZG9jcy5weXRob24ub3JnL2xpYnJhcnkvdXVpZC5odG1s'
>>>
>>>
>>>
>>> > I would prefer more concise ID's.
>>>
>>> > What do you recommend? - Compression?
>>>
>>>
>>>
>>> Does the ID need to contain all the information necessary to recreate the
>>>
>>> original URL?
>>>
>>>
>>>
>>> --
>>>
>>> John Gordon   A is for Amy, who fell down the stairs
>>>
>>> gor...@panix.com  B is for Basil, assaulted by bears
>>>
>>> -- Edward Gorey, "The Gashlycrumb Tinies"
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: asynchronous downloading

2012-02-23 Thread Richard Baron Penman
>> I want to download content asynchronously. This would be
>> straightforward to do threaded or across processes, but difficult
>> asynchronously so people seem to rely on external libraries (twisted
>> / gevent / eventlet).
>
>
> Exactly - the fact it's difficult is why those tools compete.

It is difficult in Python because the async libraries do not offer
much. Straightforward in some other languages.

Do you know why there is little support for asynchronous execution in
the standard libraries?
For large scale downloading I found thread pools do not scale well.

Richard
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: asynchronous downloading

2012-02-25 Thread Richard Baron Penman
>> I read through the python-dev archives and found the fundamental problem is 
>> no one maintains asnycore / asynchat.
>
> Well, actually I do/did.

ah OK. I had read this comment from a few years back:
"IIRC, there was a threat to remove asyncore because there were no
maintainers, no one was fixing bugs, no one was improving it, and no
one was really using it"


> Point with asyncore/asynchat is that it's original design is so flawed
> and simplicistic it doesn't allow actual customization without
> breaking compatibility.

Python3 uses the same API - was there not enough interest to improve it?

Richard
-- 
http://mail.python.org/mailman/listinfo/python-list