Re: [Python-Dev] Cloning threading.py using proccesses
I just got around to reading the messages.When I first saw this, I thought this is so that the processes that need to share and work on shared objects. That is where the locks are required. However, all shread objects are managed by the object manager and thus all such operations are in effect sequential, even acquires on different locks. Thus other shared objects in the object manager will actually not require any (additional) synchronization. Of course, the argument here is that it is still possible to use that code. Cleanup of shared objects seems to be another thing to look out for. This is a problem that subprocesses seem to avoid and has been already suggested.-ChetanOn 10/11/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Message: 5Date: Wed, 11 Oct 2006 10:23:40 +0200From: "M.-A. Lemburg" <[EMAIL PROTECTED]>Subject: Re: [Python-Dev] Cloning threading.py using proccessesTo: Josiah Carlson < [EMAIL PROTECTED]>Cc: [email protected]: <[EMAIL PROTECTED] >Content-Type: text/plain; charset=ISO-8859-1Josiah Carlson wrote:> Fredrik Lundh <[EMAIL PROTECTED]> wrote:>> Josiah Carlson wrote: > Presumably with this library you have created, you have also written a>>> fast object encoder/decoder (like marshal or pickle). If it isn't any>>> faster than cPickle or marshal, then users may bypass the module and opt >>> for fork/etc. + XML-RPC>> XML-RPC isn't close to marshal and cPickle in performance, though, so>> that statement is a bit misleading.>> You are correct, it is misleading, and relies on a few unstated > assumptions.>> In my own personal delving into process splitting, RPC, etc., I usually> end up with one of two cases; I need really fast call/return, or I need> not slow call/return. The not slow call/return is (in my opinion) > satisfactorally solved with XML-RPC. But I've personally not been> satisfied with the speed of any remote 'fast call/return' packages, as> they usually rely on cPickle or marshal, which are slow compared to > even moderately fast 100mbit network connections. When we are talking> about local connections, I have even seen cases where the> cPickle/marshal calls can make it so that forking the process is faster > than encoding the input to a called function.This is hard to believe. I've been in that business for a fewyears and so far have not found an OS/hardware/network combinationwith the mentioned features. Usually the worst part in performance breakdown for RPC is networklatency, ie. time to connect, waiting for the packets to come through,etc. and this parameter doesn't really depend on the OS or hardware you're running the application on, but is more a factor of whichnetwork hardware, architecture and structure is being used.It also depends a lot on what you send as arguments, of course,but I assume that you're not pickling a gazillion objects :-) > I've had an idea for a fast object encoder/decoder (with limited support> for certain built-in Python objects), but I haven't gotten around to> actually implementing it as of yet.Would be interesting to look at. BTW, did you know about http://sourceforge.net/projects/py-xmlrpc/ ?--Marc-Andre LemburgeGenix.comProfessional Python Services directly from the Source (#1, Oct 11 2006) >>> Python/Zope Consulting and Support ...http://www.egenix.com/>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! -- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string Re: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
The discussion on this topic seems to have died down. However, I had a look at the patch and here are some comments:This has the potential to speed up simple strings expressions likes = '1' + '2' + '3' + '4' + '5' + '6' + '7' + '8' However, if this is followed bys += '9' this (the 9th string) will cause rendering of the existing value of s and then create another concatenated string. This can, however, be changed, but I have not checked to see if it is worth it. The deallocation code needs to be robust for a complex tree - it is currently not recursive, but needs to be, like the concatenation code.Construct like s = a + b + c + d + e , where a, b etc. have been assigned string values earlier will not benefit from the patch. If the values are generated and concatenated in a single _expression_, that is another type of construct that will benefit.There are some other changes needed that I can write up if needed.-Chetan On 10/13/06, [EMAIL PROTECTED] <[EMAIL PROTECTED] > wrote:Date: Fri, 13 Oct 2006 12:02:06 -0700From: Josiah Carlson < [EMAIL PROTECTED]>Subject: Re: [Python-Dev] PATCH submitted: Speed up + for stringconcatenation, now as fast as "".join(x) idiomTo: Larry Hastings < [EMAIL PROTECTED]>, [email protected]: <[EMAIL PROTECTED] >Content-Type: text/plain; charset="US-ASCII"Larry Hastings <[EMAIL PROTECTED]> wrote:[snip]> The machine is dual-core, and was quiescent at the time. XP's scheduler > is hopefully good enough to just leave the process running on one core.It's not. Go into the task manager (accessable via Ctrl+Alt+Del bydefault) and change the process' affinity to the second core. In my experience, running on the second core (in both 2k and XP) tends toproduce slightly faster results. Linux tends to keep processes on asingle core for a few seconds at a time. - Josiah ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PATCH submitted: Speed up + for string Re: PATCHsubmitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
My statement wasn't clear enough.Rendering occurs if the string being concatenated is already a concatenation object created by an earlier assignment. In s = a + b + c + d + e + f , there would be rendering of the source string if it is already a concatenation. Here is an example that would make it clear:a = "Value a ="a += "anything" # creates a concatenationc = a + b #This would cause rendering of a and then c will become concatenation between a and b. c += "Something"# This will not append to the concatenation object, but cause rendering of c and then it will create a concatenation between c and "Something", which will be assigned to c. Now if there are a series of assignments,(1) s = c + "something" # causes rendering of c(2) s += a # causes rendering of s and creates a new concatenation(3) s += b # causes rendering of s and creates a new concatenation (4) s += c # causes rendering of s and creates a new concatenation(5) print s # causes rendering of sIf there is list of strings created and then they are concatenated with +=, I would expect it to be slower because of the additional allocations involved in rendering. -ChetanOn 10/18/06, Kristján V. Jónsson < [EMAIL PROTECTED]> wrote: Doesn't it end up in a call to PyString_Concat()? That should return a PyStringConcatenationObject too, right? K Construct like s = a + b + c + d + e , where a, b etc. have been assigned string values earlier will not benefit from the patch. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-Dev Digest, Vol 39, Issue 54
I got up in the middle of the night and wrote the email - and it shows.Apologies for creating confusion. My comments below.-ChetanOn 10/18/06, [EMAIL PROTECTED] Date: Wed, 18 Oct 2006 13:04:14 -0700From: Larry Hastings <[EMAIL PROTECTED]>Subject: Re: [Python-Dev] PATCH submitted: Speed up + for string Re:PATCHsubmitted: Speed up + for string concatenation, now as fast as "".join(x) idiomTo: [email protected]: <[EMAIL PROTECTED]>Content-Type: text/plain; charset=ISO-8859-1; format=flowed Chetan Pandya wrote:> The deallocation code needs to be robust for a complex tree - it is> currently not recursive, but needs to be, like the concatenation code.It is already both those things. Deallocation is definitely recursive. See Objects/stringobject.c,function (*ahem*) recursive_dealloc. That Py_DECREF() line is where itrecurses into child string concatenation objects.You might have been confused because it is *optimized* for the general case, where the tree only recurses down the left-hand side. For theleft-hand side it iterates, instead of recursing, which is both slightlyfaster and much more robust (unlikely to blow the stack). Actually I looked at the setting of ob_sstrings to NULL in recursive_dealloc and thought none of the strings will get destroyed as the list is destroyed. However it is only setting the first element to NULL, which is fine. > Rendering occurs if the string being concatenated is already a> concatenation object created by an earlier assignment. Nope. Rendering only occurs when somebody asks for the string's value,not when merely concatenating. If you add nine strings together, theninth one fails the "left side has room" test and creates a second object. I don't know what I was thinking. In the whole of string_concat() there is no call to render the string, except for the right recursion case. Try stepping through it. Run Python interactively under the debugger.Let it get to the prompt. Execute some _expression_ like "print 3", justso the interpreter creates its concatenated encoding object (I get "encodings.cp437"). Now, in the debugger, put a breakpoint in therendering code in recursiveConcatenate(), and another on the "op =(PyStringConcatenationObject *)PyObject_MALLOC()" line in string_concat. Finally, go back to the Python console and concatenatenine strings with this code: x = "" for i in xrange(9):x += "a"You won't hit any breakpoints for rendering, and you'll hit the string concatenation object malloc line twice. (Note that for demonstrationpurposes, this code is more illustrative than running x = "a" + "b" ...+ "i" because the peephole optimizer makes a constant folding pass. It's mostly harmless, but for my code it does mean I createconcatenation objects more often.)I don't have a patch build, since I didn't download the revision used by the patch. However, I did look at values in the debugger and it looked like x in your example above had a reference count of 2 or more within string_concat even when there were no other assignments that would account for it. My idea was to investibate this, but this was the whole reason for saying that the concatenation will create new objects. However, I ran on another machine under debugger and I get the reference count as 1, which is what I would expect. I need to find out what has happened to my work machine. In the interests of full disclosure, there is *one* scenario where purestring concatenation will cause it to render. Rendering or deallocating a recursive object that's too deep would blow the program stack, so Ilimit recursion depth on the right seven slots of the recursion object.That's what the "right recursion depth" field is used for. If you attempt to concatenate a string concatenation object that's already atthe depth limit, it renders the deep object first. The depth limit is2**14 right now. You can force this to happen by prepending like crazy: x = "" for i in xrange(2**15):x = "a" + xSince my code is careful to be only iterative when rendering anddeallocating down the left-hand side of the tree, there is no depth limit for the left-hand side.The recursion limit seems to be optimistic, given the default stack limit, but of course, I haven't tried it. There is probably a depth limit on the left hand side as well, since recursiveConcatenate is recursive even on the left side. Step before you leap,/larry/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-Dev Digest, Vol 39, Issue 55
Larry Hastings wrote:Chetan Pandya wrote:> I don't have a patch build, since I didn't download the revision used
> by the patch.> However, I did look at values in the debugger and it looked like x in> your example above had a reference count of 2 or more within> string_concat even when there were no other assignments that would
> account for it.It could be the optimizer. If you concatenate hard-coded strings, thepeephole optimizer does constant folding. It says "hey, look, thisbinary operator is performed on two constant objects". So it evaluates
the _expression_ itself and substitutes the result, in this case swapping(pseudotokens here) [PUSH "a" PUSH "b" PLUS] for [PUSH "ab"].Oddly, it didn't seem to optimize away the whole _expression_. If you say
"a" + "b" + "c" + "d" + "e", I would have expected the peepholeoptimizer to turn that whole shebang into [PUSH "abcde"]. But when Igave it a cursory glance it seemed to skip every-other; it
constant-folded "a" + "b", then + "c" and optimized ("a" + "b" + "c") +"d", resulting ultimately I believe in [PUSH "ab" PUSH "cd" PLUS PUSH
"e" PLUS]. But I suspect I missed something; it bears furtherinvestigation.I looked at the optimizer, but couldn't find any place where it does constant folding for strings. However, I an unable to set breakpoints for some mysterious reason, so investigation is somewhat hard. But I am not bothered about it anymore, since it does not behave the way I originally thought it did.
But this is all academic, as real-world performance of my patch is notcontingent on what the peephole optimizer does to short runs of
hard-coded strings in simple test cases.> The recursion limit seems to be optimistic, given the default stack> limit, but of course, I haven't tried it.I've tried it, on exactly one computer (running Windows XP). The depth
limit was arrived at experimentally. But it is probably too optimisticand should be winched down.
On the other hand, right now when you do x = "a" + x ten zillion timesthere are always two references to the concatenation object stored in x:
the interpreter holds one, and x itself holds the other. That means Ihave to build a new concatenation object each time, so it becomes adegenerate tree (one leaf and one subtree) recursing down the right-hand
side.This is the case I was thinking of (but not what I wrote).
I plan to fix that in my next patch. There's already code that says "ifthe next instruction is a store, and the location we're storing to holdsa reference to the left-hand side of the concatenation, make the
location drop its reference". That was an optimization for theold-style concat code; when the left side only had one reference itwould simply resize it and memcpy() in the right side. I plan to add
support for dropping the reference when it's the *right*-hand side of
the concatenation, as that would help prepending immensely. Once that'sdone, I believe it'll prepend ((depth limit) * (number of items inob_sstrings - 1)) + 1 strings before needing to render.
I am confused as to whether you are referring to the LHS or the concatenation operation or the assignment operation. But I haven't looked at how the reference counting optimizations are done yet. In general, there are caveats about removing references, but I plan to look at that later.
There is another, possibly complimentary way of reducing the recursion depth. While creating a new concatenation object, instead of inserting the two string references, the strings they reference can be inserted in the new object. This can be done if the number of strings they contain is small. In the x = "a" + x case, for example, this will reduce the recursion depth of the string tree (but not reduce the allocations).
-Chetan
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
