Re: ARM cross compile - one last problem

2007-06-19 Thread Justin T.
On Jun 19, 10:49 am, [EMAIL PROTECTED] wrote:
> Hello all,
>
> I've been trying to get Python to cross compile to linux running on an
> ARM. I've been fiddling with the cross compile patches 
> here:http://sourceforge.net/tracker/index.php?func=detail&aid=1597850&grou...
>
> and I've had some success. Python compiles and now all of the
> extensions do too, but when I try to import some of them (time,
> socket, etc.), they have trouble finding certain symbols.
> Py_Exc_IOError and _Py_NoneStruct are the two I remember seeing. It
> would appear that they are exported by libpython, which I believe is
> statically linked into the python executable? That's where I start to
> get confused. What part of python is breaking? Where should I be
> looking for problems?
>
> Thanks a lot!
>
> Justin

Alright, I looked into this a little more, and those symbols
definitely exist in my compiled python executable. How are extensions
linked to the python interpreter?

Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Stackless Integration

2007-08-09 Thread Justin T.
Hi,

I've been looking at stackless python a little bit, and it's awesome.
My question is, why hasn't it been integrated into the upstream python
tree? Does it cause problems with the current C-extensions? It seems
like if something is fully compatible and better, then it would be
adopted. However, it hasn't been in what appears to be 7 years of
existence, so I assume there's a reason.

Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Stackless Integration

2007-08-09 Thread Justin T.
On Aug 9, 8:57 am, "Terry Reedy" <[EMAIL PROTECTED]> wrote:
> First, which 'stackless'?  The original continuation-stackless (of about 7
> years ago)?  Or the more current tasklet-stackless (which I think is much
> younger than that)?
>
The current iteration. I can certianly understand Guido's distaste for
continuations.

>
> overcome.  It is just not part of the stdlib.
And I wish it were! It wouldn't be such a pain to get to my developers
then.

> And as far as I know or
> could find in the PEP index, C. Tismer has never submitted a PEP asking
> that it be made so.  Doing so would mean a loss of control, so there is a
> downside as well as the obvious upside of distribution.
That's true. Though, hopefully, the powers that be would allow him to
maintain it while it's in the stdlib. Maybe we should file a PEP for
him... :)

Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Stackless Integration

2007-08-09 Thread Justin T.

> It's not Pythonic.
>
> Jean-Paul

Ha! I wish there was a way to indicate sarcasm on the net. You almost
got people all riled up!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Threaded Design Question

2007-08-09 Thread Justin T.
On Aug 9, 11:25 am, [EMAIL PROTECTED] wrote:
>
> Here's how I have it designed so far.  The main thread starts a
> Watch(threading.Thread) class that loops and searches a directory for
> files.  It has been passed a Queue.Queue() object (watch_queue), and
> as it finds new files in the watch folder, it adds the file name to
> the queue.
>
> The main thread then grabs an item off the watch_queue, and kicks off
> processing on that file using another class Worker(threading.thread).
>
Sounds good.

>
> I made definite progress by creating two queues...watch_queue and
> processing_queue, and then used lists within the classes to store the
> state of which files are processing/watched.
>
This sounds ugly, synchronization is one of those evils of
multithreaded programming that should be avoided if possible. I see a
couple of dirt simple solutions:

1. Have the watch thread move the file into a "Processing" folder that
it doesn't scan
2. Have the watch thread copy the file into a python tempfile object
and push that onto the queue, then delete the real file. This can be
done efficiently (well, more efficiently than new.write(old.read())
with shutil.copyfileobj(old, new)

Both those take very few lines of code, don't require synchronization,
and don't require extending standard classes.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Threaded Design Question

2007-08-09 Thread Justin T.
On Aug 9, 5:39 pm, MRAB <[EMAIL PROTECTED]> wrote:
> On Aug 9, 7:25 pm, [EMAIL PROTECTED] wrote:
>
> > Hi all!  I'm implementing one of my first multithreaded apps, and have
> > gotten to a point where I think I'm going off track from a standard
> > idiom.  Wondering if anyone can point me in the right direction.
>
> > The script will run as a daemon and watch a given directory for new
> > files.  Once it determines that a file has finished moving into the
> > watch folder, it will kick off a process on one of the files.  Several
> > of these could be running at any given time up to a max number of
> > threads.
>
> > Here's how I have it designed so far.  The main thread starts a
> > Watch(threading.Thread) class that loops and searches a directory for
> > files.  It has been passed a Queue.Queue() object (watch_queue), and
> > as it finds new files in the watch folder, it adds the file name to
> > the queue.
>
> > The main thread then grabs an item off the watch_queue, and kicks off
> > processing on that file using another class Worker(threading.thread).
>
> > My problem is with communicating between the threads as to which files
> > are currently processing, or are already present in the watch_queue so
> > that the Watch thread does not continuously add unneeded files to the
> > watch_queue to be processed.  For example...Watch() finds a file to be
> > processed and adds it to the queue.  The main thread sees the file on
> > the queue and pops it off and begins processing.  Now the file has
> > been removed from the watch_queue, and Watch() thread has no way of
> > knowing that the other Worker() thread is processing it, and shouldn't
> > pick it up again.  So it will see the file as new and add it to the
> > queue again.  PS.. The file is deleted from the watch folder after it
> > has finished processing, so that's how i'll know which files to
> > process in the long term.
>
> I would suggest something like the following in the watch thread:
>
> seen_files = {}
>
> while True:
> # look for new files
> for name in os.listdir(folder):
> if name not in seen_files:
> process_queue.add(name)
> seen_files[name] = True
>
> # forget any missing files and mark the others as not seen, ready for
> next time
> seen_files = dict((name, False) for name, seen in seen_files.items()
> if seen)
>
> time.sleep(1)

Hmm, this wouldn't work. It's not thread safe and the last line before
you sleep doesn't make any sense.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Threaded Design Question

2007-08-09 Thread Justin T.

> approach.  That sounds the easiest, although I'm still interested in
> any idioms or other proven approaches for this sort of thing.
>
> ~Sean

Idioms certainly have their place, but in the end you want clear,
correct code. In the case of multi-threaded programming,
synchronization adds complexity, both in code and concepts, so
figuring out a clean design that uses message passing tends to be
clearer and more robust. Most idioms are just a pattern to which
somebody found a simple, robust solution, so if you try to think of a
simple, robust solution, you're probably doing it right. Especially in
trivial cases like the one above.

Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The Future of Python Threading

2007-08-10 Thread Justin T.
On Aug 10, 3:57 am, Steve Holden <[EMAIL PROTECTED]> wrote:
> Justin T. wrote:
> > Hello,
>
> > While I don't pretend to be an authority on the subject, a few days of
> > research has lead me to believe that a discussion needs to be started
> > (or continued) on the state and direction of multi-threading python.
> [...]
> > What these seemingly unrelated thoughts come down to is a perfect
> > opportunity to become THE next generation language. It is already far
> > more advanced than almost every other language out there. By
> > integrating stackless into an architecture where tasklets can be
> > divided over several parallelizable threads, it will be able to
> > capitalize on performance gains that will have people using python
> > just for its performance, rather than that being the excuse not to use
> > it.
>
> Aah, the path to world domination. You know you don't *have* to use
> Python for *everything*.
>
True, but Python seems to be the *best* place to tackle this problem,
at least to me. It has a large pool of developers, a large standard
library, it's evolving, and it's a language I like :). Languages that
seamlessly support multi-threaded programming are coming, as are
extensions that make it easier on every existent platform. Python has
the opportunity to lead that change.

>
> Be my guest, if it's so simple.
>
I knew somebody was going to say that! I'm pretty busy, but I'll see
if I can find some time to look into it.

>
> I doubt that a thread on c.l.py is going to change much. It's the
> python-dev and py3k lists where you'll need to take up the cudgels,
> because I can almost guarantee nobody is going to take the GIL out of
> 2.6 or 2.7.
>

I was hoping to get a constructive conversation on what the structure
of a multi-threaded python would look like. It would appear that this
was not the place for that.

> Is it even possible
> to run threads of the same process at different priority levels on all
> platforms?
No, it's not, and even fewer allow the scheduler to change the
priority dynamically. Linux, however, is one that does.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The Future of Python Threading

2007-08-10 Thread Justin T.
On Aug 10, 3:52 am, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> On Fri, 10 Aug 2007 10:01:51 -, "Justin T." <[EMAIL PROTECTED]> wrote:
> >Hello,
>
> >While I don't pretend to be an authority on the subject, a few days of
> >research has lead me to believe that a discussion needs to be started
> >(or continued) on the state and direction of multi-threading python.
>
> > [snip - threading in Python doesn't exploit hardware level parallelism
> >  well, we should incorporate stackless and remove the GIL to fix this]
>
> I think you have a misunderstanding of what greenlets are.  Greenlets are
> essentially a non-preemptive user-space threading mechanism.  They do not
> allow hardware level parallelism to be exploited.

I'm not an expert, but I understand that much. What greenlets do is
force the programmer to think about concurrent programming. It doesn't
force them to think about real threads, which is good, because a
computer should take care of that for you.Greenlets are nice because
they can run concurrently, but they don't have to. This means you can
safely divide them up among many threads. You could not safely do this
with just any old python program.

>
> >There has been much discussion on this in the past [2]. Those
> >discussions, I feel, were premature. Now that stackless is mature (and
> >continuation free!), Py3k is in full swing, and parallel programming
> >has been fully realized as THE next big problem for computer science,
> >the time is ripe for discussing how we will approach multi-threading
> >in the future.
>
> Many of the discussions rehash the same issues as previous ones.  Many
> of them are started based on false assumptions or are discussions between
> people who don't have a firm grasp of the relevant issues.

That's true, but there are actually a lot of good ideas in there as
well.

>
> I don't intend to suggest that no improvements can be made in this area of
> Python interpreter development, but it is a complex issue and cheerleading
> will only advance the cause so far.  At some point, someone needs to write
> some code.  Stackless is great, but it's not the code that will solve this
> problem.
Why not? It doesn't solve it on its own, but its a pretty good start
towards something that could.

-- 
http://mail.python.org/mailman/listinfo/python-list


The Future of Python Threading

2007-08-10 Thread Justin T.
Hello,

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.

Python is not multi-threading friendly. Any code that deals with the
python interpreter must hold the global interpreter lock (GIL). This
has the effect of serializing (to a certain extent) all python
specific operations. IE, any thread that is written purely in python
will not release the GIL except at particular (and possibly non-
optimal) times. Currently that's the rather arbitrary quantum of 100
bytecode instructions. Since the ability of the OS to schedule python
threads is based on when its possible to run that thread (according to
the lock), python threads do not benefit from a good scheduler in the
same manner that real OS threads do, even though python threads are
supposed to be a thin wrapper around real OS threads[1].

The detrimental effects of the GIL have been discussed several times
and nobody has ever done anything about it. This is because the GIL
isn't really that bad right now. The GIL isn't held that much, and
pthreads spawned by python-C interations (Ie, those that reside in
extensions) can do all their processing concurrently as long as they
aren't dealing with python data. What this means is that python
multithreading isn't really broken as long as python is thought of as
a convenient way of manipulating C. After all, 100 bytecode
instructions go by pretty quickly, so the GIL isn't really THAT
invasive.

Python, however, is much better than a convenient method of
manipulating C. Python provides a simple language  which can be
implemented in any way, so long as promised behaviors continue. We
should take advantage of that.

The truth is that the future (and present reality) of almost every
form of computing is multi-core, and there currently is no effective
way of dealing with concurrency. We still worry about setting up
threads, synchronization of message queues, synchronization of shared
memory regions, dealing with asynchronous behaviors, and most
importantly, how threaded an application should be. All of this is
possible to do manually in C, but its hardly optimal. For instance, at
compile time you have no idea if your library is going to be running
on a machine with 1 processor or 100. Knowing that makes a huge
difference in architecture as 200 threads might run fine on the 100
core machine where it might thrash the single processor to death.
Thread pools help, but they need to be set up and initialized. There
are very few good thread pool implementations that are meant for
generic use.

It is my feeling that there is no better way of dealing with dynamic
threading than to use a dynamic language. Stackless python has proven
that clever manipulation of the stack can dramatically improve
concurrent performance in a single thread. Stackless revolves around
tasklets, which are a nearly universal concept.

For those who don't follow experimental python implementations,
stackless essentially provides an integrated scheduler for "green
threads" (tasklets), or extremely lightweight snippets of code that
can be run concurrently. It even provides a nice way of messaging
between the tasklets.

When you think about it, lots of object oriented code can be organized
as tasklets. After all, encapsulation provides an environment where
side effects of running functions can be minimized, and is thus
somewhat easily parallelized (with respect to other objects).
Functional programming is, of course, ideal, but its hardly the trendy
thing these days. Maybe that will change when people realize how much
easier it is to test and parallelize.

What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.

The nice thing is that this requires a fairly doable amount of work.
First, stackless should be integrated into the core. Then there should
be an effort to remove the reliance on the GIL for python threading.
After that, advanced features like moving tasklets amongst threads
should be explored. I can imagine a world where a single python web
application is able to redistribute its millions of requests amongst
thousands of threads without the developer ever being aware that the
application would eventually scale. An efficient and natively multi-
threaded implementation of python will be invaluable as cores continue
to multiply like rabbits.

There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!

Re: The Future of Python Threading

2007-08-10 Thread Justin T.
On Aug 10, 2:02 pm, [EMAIL PROTECTED] (Luc Heinrich) wrote:
> Justin T. <[EMAIL PROTECTED]> wrote:
> > What these seemingly unrelated thoughts come down to is a perfect
> > opportunity to become THE next generation language.
>
> Too late: <http://www.erlang.org/>
>
> :)
>
> --
> Luc Heinrich

Uh oh, my ulterior motives have been discovered!

I'm aware of Erlang, but I don't think it's there yet. For one thing,
it's not pretty enough. It also doesn't have the community support
that a mainstream language needs. I'm not saying it'll never be
adequate, but I think that making python into an Erlang competitor
while maintaining backwards compatibility with the huge amount of
already written python software will make python a very formidable
choice as languages adapt more and more multi-core support. Python is
in a unique position as its actually a flexible enough language to
adapt to a multi-threaded environment without resorting to terrible
hacks.

Justin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The Future of Python Threading

2007-08-10 Thread Justin T.
On Aug 10, 10:34 am, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:

> >I'm not an expert, but I understand that much. What greenlets do is
> >force the programmer to think about concurrent programming. It doesn't
> >force them to think about real threads, which is good, because a
> >computer should take care of that for you.Greenlets are nice because
> >they can run concurrently, but they don't have to. This means you can
> >safely divide them up among many threads. You could not safely do this
> >with just any old python program.
>
> There may be something to this.  On the other hand, there's no _guarantee_
> that code written with greenlets will work with pre-emptive threading instead
> of cooperative threading.  There might be a tendency on the part of developers
> to try to write code which will work with pre-emptive threading, but it's just
> that - a mild pressure towards a particular behavior.  That's not sufficient
> to successfully write correct software (where "correct" in this context means
> "works when used with pre-emptive threads", of course).
Agreed. Stackless does include a preemptive mode, but if you don't use
it, then you don't need to worry about locking at all. It would be
quite tricky to get around this, but I don't think it's impossible.
For instance, you could just automatically lock anything that was not
a local variable. Or, if you required all tasklets in one object to
run in one thread, then you would only have to auto-lock globals.

>
> One also needs to consider the tasks necessary to really get this integration
> done.  It won't change very much if you just add greenlets to the standard
> library.  For there to be real consequences for real programmers, you'd
> probably want to replace all of the modules which do I/O (and maybe some
> that do computationally intensive things) with versions implemented using
> greenlets.  Otherwise you end up with a pretty hard barrier between greenlets
> and all existing software that will probably prevent most people from changing
> how they program.

If the framework exists to efficiently multi-thread python, I assume
that the module maintainers will slowly migrate over if there is a
performance benefit there.
>
> Then you have to worry about the other issues greenlets introduce, like
> invisible context switches, which can make your code which _doesn't_ use
> pre-emptive threading broken.

Not breaking standard python code would definitely be priority #1 in
an experiment like this. I think that by making the changes at the
core we could achieve it. A standard program, after all, is just 1
giant tasklet.
>
> All in all, it seems like a wash to me.  There probably isn't sufficient
> evidence to answer the question definitively either way, though.  And trying
> to make it work is certainly one way to come up with such evidence. :)

::Sigh:: I honestly don't see myself having time to really do anything
more than experiment with this. Perhaps I will try to do that though.
Sometimes I do grow bored of my other projects. :)

Justin

-- 
http://mail.python.org/mailman/listinfo/python-list