Thread killing - I know I know!

2008-05-16 Thread Roger Heathcote

Hello everyone, this is my first post so hello & please be gentle!

Despite many peoples insistence that allowing for the arbitrary killing 
of threads is a cardinal sin and although I have no particular threading 
problem to crack right now I remain interest in the taboo that is thread 
killing. The real world and it's data are messy and imperfect and I can 
think of several scenarios where it could be useful to be able to bump 
off a naughty thread, especially when using/testing unstable 3rd party 
modules and other code that is an unknown quantity.


With this in mind I am experimenting with a set of threading subclasses 
that would permit timeouts and the manual killing of threads. I'm trying 
to evaluate how robust such a scheme can be made and what the 
limitations would be in practice.


So far I can seemingly murder/timeout pure python threads that are stuck 
blocking for input and stuck in infinite loops but I'm guessing there 
are many more awkward cases. What I'm looking for are examples of 
code/modules that can get stuck in dead ends or might otherwise be 
problematic to terminate.


In particular I'd be interested to see if I could kill a non-returning c 
module. Of course naturally no one wants to be publishing buggy modules 
and the like so I am having trouble finding examples of misbehaving c 
code to explore further. I figure I could learn how to write c modules 
for python but, while it's something I'd like to know someday, I'm 
guessing that will be too long to fit into the immediate future :/ 
Consequently if anyone has detailed knowledge of what is and isn't 
fundamentally possible in the world of thread culling, or if you can 
point me to some gnarly/non-returning thread code to test with I would 
be forever grateful.


Oh yes, I'm working primarily with 2.5 on XP however I have access to 
linux & OSX boxen and I'd be interested to learn about problematic 
thread code for either of those platforms as well.


Thanks for reading,

Roger Heathcote - www.technicalbloke.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Thread killing - I know I know!

2008-05-19 Thread Roger Heathcote

[EMAIL PROTECTED] wrote:

On May 16, 11:40 am, Roger Heathcote <[EMAIL PROTECTED]>
wrote:

Despite many peoples insistence that allowing for the arbitrary killing
of threads is a cardinal sin and although I have no particular threading
problem to crack right now I remain interest in the taboo that is thread
killing. The real world and it's data are messy and imperfect and I can



In general, use processes when you can and threads only when you
must.  OS designers spent a lot of energy implementing protected
memory, no sense throwing out a fair chunk of that hard work unless
you actually need to.


Fair point, but for sub processes that need to be in close contact with 
the original app, or very small functions that you'd like 100s or 1000s 
of it seems like a kludge having to spawn whole new processes build in 
socket communications and kill via explicit OS calls. I can't see that 
approach scaling particularly well but I guess there's no choice.


Does anyone think it likely that the threading module (and threading in 
general) will be improved and augmented with features like timeouts and 
arbitrary threadicide in the next year or two?  Seems there's little 
scope for tapping the power of the current generation of multiple cores 
with the current pythons, tho I appreciate breakneck speed has never 
been a design objective it looks like multicore is set to be the next 
generation PC architecture.


Roger Heathcote - technicalbloke.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: please critique my thread code

2008-06-19 Thread Roger Heathcote

MRAB wrote:

On Jun 15, 2:29 pm, [EMAIL PROTECTED] wrote:

I wrote a Python program (103 lines, below) to download developer data
from SourceForge for research about social networks.

Please critique the code and let me know how to improve it.

An example use of the program:

prompt> python download.py 1 24

The above command downloads data for the projects with IDs between 1
and 24, inclusive. As it runs, it prints status messages, with a
plus sign meaning that the project ID exists. Else, it prints a minus
sign.

Questions:

--- Are my setup and use of threads, the queue, and "while True" loop
correct or conventional?

--- Should the program sleep sometimes, to be nice to the SourceForge
servers, and so they don't think this is a denial-of-service attack?

--- Someone told me that popen is not thread-safe, and to use
mechanize. I installed it and followed an example on the web site.
There wasn't a good description of it on the web site, or I didn't
find it. Could someone explain what mechanize does?

--- How do I choose the number of threads? I am using a MacBook Pro
2.4GHz Intel Core 2 Duo with 4 GB 667 MHz DDR2 SDRAM, running OS
10.5.3.

Thank you.

Winston


[snip]
String methods are quicker than regular expressions, so don't use
regular expressions if string methods are perfectly adequate. For
example, you can replace:




Erm, shurely the bottleneck will be bandwidth not processor/memory?* If 
it isn't then - yes, you run the risk of actually DOSing their servers!


Your mac will run thousands of threads comfortably but your router may 
not handle the thousands of TCP/IP connections you throw at it very 
well, especially if it is a domestic model, and sure as hell sourceforge 
aren't going to want more than a handfull of concurrent connections from 
you.


Typical sourceforge page ~ 30K
Project pages to read = 24

= ~6.8 Gigabytes

Maybe send their sysadmin a box of chocolates if you want to grab all 
that in any less than a week and not get your IP blocked! :)



Roger Heathcote

* Of course, stylistically, MRAB is perfectly right about not wasting 
CPU on regexes where string methods will do, unless you are planning on 
making your searches more elaborate in the future.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Combining music or video files?

2008-06-19 Thread Roger Heathcote

John Salerno wrote:
"Dennis Lee Bieber" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]

On Sun, 15 Jun 2008 22:53:19 -0400, John Salerno
<[EMAIL PROTECTED]> declaimed the following in comp.lang.python:
Even the simplest format -> WAV, which is normally uncompressed
audio samples, is wrapped in layers of informational packets.

snip other stuff!!!


Yikes! Then what I'm reading from your post (and others) is no, I can't do 
it my way. ;) It *did* seem a little too easy, after all! 





I can't speak for video (and I'd imagine it's a factor more difficult) 
but it's really not that hard to concatenate audio in python. What you 
need to do...


Import the right modules for the audio formats you want to read.
Decide on a master output format, say CD quality - PCM 16bit 44.1Khz Stereo.
Open a file for appending
Read through each of your source files loading them into memory
  Get the sample rate and format
  Run the sample data through a function to convert it to the master 
output format*

  Squirt it to disk

Finally take the resulting file, calculate a new header and munge the 
two together. Voila :)



*I'm sure several modules have functions/methods for this but it 
wouldn't be very hard to roll your own either.



Roger Heathcote

http://www.technicalbloke.com
--
http://mail.python.org/mailman/listinfo/python-list