Re: copy on write
Steven D'Aprano writes: > Perhaps you are thinking that Python could determine ahead of time > whether x[1] += y involved a list or a tuple, and not perform the > finally assignment if x was a tuple. Well, maybe, but such an approach > (if possible!) is fraught with danger and mysterious errors even > harder to debug than the current situation. And besides, what should > Python do about non-built-in types? There is no way in general to > predict whether x[1] = something will succeed except to actually try > it. An alternative approach is to simply not perform the final assignment if the in-place method is available on the contained object. No prediction is needed to do it, because the contained object has to be examined anyway. No prediction is needed, just don't. Currently, lhs[ind] += rhs is implemented like this: item = lhs[ind] if hasattr(item, '__iadd__'): lhs.__setitem__(ind, item.__iadd__(rhs)) else: lhs.__setitem__(ind, item + rhs) # (Note item assignment in both "if" branches.) It could, however, be implemented like this: item = lhs[ind] if hasattr(item, '__iadd__'): item += rhs # no assignment, item supports in-place change else: lhs.__setitem__(ind, lhs[ind] + rhs) This would raise the exact same exception in the tuple case, but without executing the in-place assignment. On the other hand, some_list[ind] += 1 would continue working exactly the same as it does now. In the same vein, in-place methods should not have a return value (i.e. they should return None), as per Python convention that functions called for side effect don't return values. The alternative behavior is unfortunately not backward-compatible (it ignores the return value of augmented methods), so I'm not seriously proposing it, but I believe it would have been a better implementation of augmented assignments than the current one. The present interface doesn't just bite those who try to use augmented assignment on tuples holding mutable objects, but also those who do the same with read-only properties, which is even more reasonable. For example, obj.list_attr being a list, one would expect that obj.list_attr += [1, 2, 3] does the same thing as obj.list_attr.extend([1, 2, 3]). And it almost does, except it also follows up with an assignment after the list has already been changed, and the assignment to a read-only property raises an exception. Refusing to modify the list would have been fine, modifying it without raising an exception (as described above) would have been better, but modifying it and *then* raising an exception is a surprise that takes some getting used to. -- http://mail.python.org/mailman/listinfo/python-list
Re: round down to nearest number
Terry Reedy writes: > On 2/9/2012 8:23 PM, noydb wrote: >> So how would you round UP always? Say the number is 3219, so you want (//100+1)*100 > 3400 Note that that doesn't work for numbers that are already round: >>> (3300//100+1)*100 3400# 3300 would be correct I'd go with Chris Rebert's (x + 99) // 100. -- http://mail.python.org/mailman/listinfo/python-list
Re: Porting the 2-3 heap data-structure library from C to Python
Alec Taylor writes: > The source-code used has been made available: > http://www.cosc.canterbury.ac.nz/research/RG/alg/ttheap.h > http://www.cosc.canterbury.ac.nz/research/RG/alg/ttheap.c > > I plan on wrapping it in a class. You should get acquainted with the Python/C API, which is the standard way of extending Python with high-performance (and/or system-specific) C code. See "Extending and Embedding" and "Python/C API" sections at http://docs.python.org/. There is also a mailing list for help with the C API, see http://mail.python.org/mailman/listinfo/capi-sig for details. -- http://mail.python.org/mailman/listinfo/python-list
Re: Porting the 2-3 heap data-structure library from C to Python
Stefan Behnel writes: >> which is the standard way of extending Python with high-performance >> (and/or system-specific) C code. > > Well, it's *one* way. Certainly not the easiest way, neither the most > portable and you'll have a hard time making it the fastest. I didn't say it was easy, but standard, in the sense of documented in Python documentation. Python/C is as portable as Python itself, and as fast as the platform allows. I understand your desire to promote Cython, but please stop resorting to FUD in doing so. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python and Lisp : car and cdr
Ethan Furman writes: >> def car(L): >> return L[0] >> def cdr(L): >> return L[1] > > IANAL (I am not a Lisper), but shouldn't that be 'return L[1:]' ? Not for the linked list implementation he presented. >> def length(L): >> if not L: return 0 >> return 1 + length(cdr(L)) > > How is this different from regular ol' 'len' ? len would just return 2 for every linked list, and would raise an exception for empty list (represented by None in Lie's implementation). A more Pythonic implementation would represent the linked list as a first-class objects with car and cdr being attributes, allowing for fairly natural expression of __len__, __iter__, etc. For example: class List(object): __slots__ = 'car', 'cdr' def __init__(self, it=()): it = iter(it) try: self.car = it.next() except StopIteration: pass else: self.cdr = List(it) def __len__(self): if not hasattr(self, 'cdr'): return 0 return 1 + len(self.cdr) def __iter__(self): head = self while hasattr(head, 'cdr'): yield head.car head = head.cdr def __repr__(self): return "%s(%r)" % (type(self).__name__, list(self)) >>> l = List([1, 2, 3]) >>> l List([1, 2, 3]) >>> l.car 1 >>> l.cdr List([2, 3]) >>> l.cdr.cdr.car 3 >>> l.cdr.cdr.cdr List([]) >>> tuple(l) (1, 2, 3) -- http://mail.python.org/mailman/listinfo/python-list
Re: How does CO_FUTURE_DIVISION compiler flag get propagated?
Terry writes: > Future division ("from __future__ import division") works within > scripts executed by import or execfile(). However, it does not work > when entered interactively in the interpreter like this: > from __future__ import division a=2/3 Are you referring to the interactive interpreter normally invoked by just running "python"? That seems to work for me: Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 2/3 0 >>> from __future__ import division >>> 2/3 0. -- http://mail.python.org/mailman/listinfo/python-list
Re: Possible File iteration bug
Billy Mays writes: > Is there any way to just create a new generator that clears its > closed` status? You can define getLines in terms of the readline file method, which does return new data when it is available. def getLines(f): lines = [] while True: line = f.readline() if line == '': break lines.append(line) return lines or, more succinctly: def getLines(f): return list(iter(f.readline, '')) -- http://mail.python.org/mailman/listinfo/python-list
Re: Convert '165.0' to int
Frank Millman writes: > int(float(x)) does the job, and I am happy with that. I was just > asking if there were any alternatives. int(float(s)) will corrupt integers larger than 2**53, should you ever need them. int(decimal.Decimal(s)) works with numbers of arbitrary size. -- http://mail.python.org/mailman/listinfo/python-list
Re: list comprehension to do os.path.split_all ?
Neil Cerutti writes: > On 2011-07-29, Dennis Lee Bieber wrote: >> Fine... So normpath it first... >> > os.path.normpath(r'C:/windows').split(os.sep) >> ['C:', 'windows'] That apparently doesn't distinguish between r'C:\windows' and r'C:windows'. On Windows the first is an absolute path, the second a relative path, and both contain a drive letter. > while tail != '': > retval.append(tail) > head, tail = os.path.split(head) > else: > if os.path.isabs(path): > retval.append(os.path.sep) > return list(reversed(retval)) Note that using 'else' after 'while' is superfluous if the loop doesn't contain a 'break' statement. -- http://mail.python.org/mailman/listinfo/python-list
Re: Use-cases for alternative iterator
Steven D'Aprano writes: > I've never seen this second form in actual code. Does anyone use it, > and if so, what use-cases do you have? Since APIs that signal end-of-iteration by returning a sentinel have fallen out of favor in Python (with good reason), this form is rare, but still it's sometimes useful. I've used it in actual code for reading a file in fixed-size chunks, like this: for chunk in iter(lambda: f.read(CHUNK_SIZE), ''): ... -- http://mail.python.org/mailman/listinfo/python-list
Re: generator / iterator mystery
Dave Abrahams writes: list(chain( *(((x,n) for n in range(3)) for x in 'abc') )) > [('c', 0), ('c', 1), ('c', 2), ('c', 0), ('c', 1), ('c', 2), ('c', 0), ('c', > 1), ('c', 2)] > > Huh? Can anyone explain why the last result is different? list(chain(*EXPR)) is constructing a tuple out of EXPR. In your case, EXPR evaluates to a generator expression that yields generator expressions iterated over by chain and then by list. It is equivalent to the following generator: def outer(): for x in 'abc': def inner(): for n in range(3): yield x, n yield inner() list(chain(*outer())) ... the same result as above ... The problem is that all the different instances of the inner() generator refer to the same "x" variable, whose value has been changed to 'c' by the time any of them is called. The same gotcha is often seen in code that creates closures in a loop, such as: >>> fns = [(lambda: x+1) for x in range(3)] >>> map(apply, fns) [3, 3, 3] # most people would expect [1, 2, 3] In your case the closure is less explicit because it's being created by a generator expression, but the principle is exactly the same. The classic fix for this problem is to move the closure creation into a function, which forces a new cell to be allocated: def adder(x): return lambda: x+1 >>> fns = [adder(x) for x in range(3)] >>> map(apply, fns) [1, 2, 3] This is why your enum3 variant works. -- http://mail.python.org/mailman/listinfo/python-list
Re: What other languages use the same data model as Python?
Steven D'Aprano writes: > "Python's data model is different from other languages" > > which is perfectly correct, if you think of C as "other languages". But > it's equally correct to say that Python's data model is the same as other > languages. As I understand it, Python and Ruby have the same data model. > So does Java, so long as you only consider objects[...] > What other languages use the same, or mostly similar, data model as > Python? Count in Common Lisp and Scheme. I would say that, considering currently most popular languages and platforms, Python's data model is in the majority. It is only the people coming from a C++ background that tend to be confused by it. -- http://mail.python.org/mailman/listinfo/python-list
Re: __dict__ attribute for built-in types
candide writes: > But beside this, how to recognise classes whose object doesn't have a > __dict__ attribute ? str, list and others aren't classes, they are types. While all (new-style) classes are types, not all types are classes. It's instances of classes (types created by executing the "class" statement or its equivalent) that automatically get a __dict__, unless __slots__ was used at class definition time to suppress it. Built-in and extension types can choose whether to implement __dict__. (Mechanics of defining built-in and extension types are of course implementation-specific. CPython allows adding __dict__ to any extension type by setting the tp_dictoffset member of the type definition struct to the appropriate offset into the instance struct.) -- http://mail.python.org/mailman/listinfo/python-list
Re: __dict__ attribute for built-in types
candide writes: > Le 28/10/2011 00:57, Hrvoje Niksic a écrit : > >> was used at class definition time to suppress it. Built-in and >> extension types can choose whether to implement __dict__. >> > > Is it possible in the CPython implementation to write something like this : > > "foo".bar = 42 > > without raising an attribute error ? No, and for good reason. Strings are immutable, so that you needn't care which particular instance of "foo" you're looking at, they're all equivalent. The interpreter uses that fact to cache instances of short strings such as Python identifiers, so that most places that look at a string like "foo" are in fact dealing with the same instance. If one could change an attribute of a particular instance of "foo", it would no longer be allowed for the interpreter to transparently cache them. The same goes for integers and other immutable built-in objects. If you really need to attach state to strings, subclass them as Steven explained. All code that accepts strings (including all built-ins) will work just fine, transparent caching will not happen, and attributes are writable. -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary sorting
Ben Finney writes: > Tim Chase writes: > >> On 11/03/11 16:36, Terry Reedy wrote: >> > CPython iterates (and prints) dict items in their arbitrary internal >> > hash table order, which depends on the number and entry order of the >> > items. It is a bug to depend on that arbitrary order in any way. >> >> Does this "never trust it" hold even for two consecutive iterations >> over an unchanged dict? I didn't see anything in the docs[1] to make >> such a claim, > > Exactly. This is false. The docs say: If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()). (http://docs.python.org/library/stdtypes.html#mapping-types-dict) > The order of retrieval is entirely up to the implementation. This part is still true, but the order won't change behind your back if you're not touching the dict. -- http://mail.python.org/mailman/listinfo/python-list
Re: Server Questions (2 of them)
Andrew writes: > How to do you create a server that accepts a set of user code? [...] Look up the "exec" statement, the server can use it to execute any code received from the client as a string. Note "any code", though; exec runs in no sandbox and if a malicious client defines addition(1, 2) to execute os.system('sudo rm -rf /'), the server will happily do just that. -- http://mail.python.org/mailman/listinfo/python-list
Re: unpack('>f', b'\x00\x01\x00\x00')
Chris Rebert writes: > C does not have a built-in fixed-point datatype, so the `struct` > module doesn't handle fixed-point numbers directly. The built-in decimal module supports fixed-point arithmetic, but the struct module doesn't know about it. A bug report (or patch) by someone who works with binary representations of fixed-point would be a good start to improve it. -- http://mail.python.org/mailman/listinfo/python-list
Re: order independent hash?
Chris Angelico writes: >> The hash can grow with (k,v) pairs accumulated in the run time. >> An auto memory management mechanism is required for a hash of a non-fixed >> size of (k,v) pairs. > > That's a hash table In many contexts "hash table" is shortened to "hash" when there is no ambiguity. This is especially popular among Perl programmers where the equivalent of dict is called a hash. > Although strictly speaking, isn't that "Python dicts are implemented > as hash tables in CPython"? Or is the hashtable implementation > mandated? It's pretty much mandated because of the __hash__ protocol. -- http://mail.python.org/mailman/listinfo/python-list
Re: order independent hash?
Terry Reedy writes: >> [Hashing is] pretty much mandated because of the __hash__ protocol. > > Lib Ref 4.8. Mapping Types — dict > "A mapping object maps hashable values to arbitrary objects." > > This does not say that the mapping has to *use* the hash value ;-). > Even if it does, it could use a tree structure instead of a hash > table. An arbitrary mapping doesn't, but reference to the hash protocol was in the context of implementation constraints for dicts themselves (my response quotes the relevant part of Chris's message). If a Python implementation tried to implement dict as a tree, instances of classes that define only __eq__ and __hash__ would not be correctly inserted in such a dict. This would be a major source of incompatibility with Python code, both in the standard library and at large. -- http://mail.python.org/mailman/listinfo/python-list
Re: order independent hash?
Chris Angelico writes: > 2011/12/5 Hrvoje Niksic : >> If a Python implementation tried to implement dict as a tree, >> instances of classes that define only __eq__ and __hash__ would not >> be correctly inserted in such a dict. > > Couldn't you just make a tree of hash values? Okay, that's probably > not the most useful way to do things, but technically it'd comply with > the spec. That's a neat idea. The leaves of the tree would contain a list of items with the same hash, but that's what you effectively get with a linear-probe hash table anyway. As you said, not immediately useful, but one could imagine the technique being of practical use when implementing Python or a Python-compatible language in a foreign environment that supports only tree-based collections. -- http://mail.python.org/mailman/listinfo/python-list
Re: order independent hash?
Tim Chase writes: > From an interface perspective, I suppose it would work. However one > of the main computer-science reasons for addressing by a hash is to > get O(1) access to items (modulo pessimal hash structures/algorithms > which can approach O(N) if everything hashes to the same > value/bucket), rather than the O(logN) time you'd get from a tree. So > folks reaching for a hash/map might be surprised if performance > degraded with the size of the contents. In a language like Python, the difference between O(1) and O(log n) is not the primary reason why programmers use dict; they use it because it's built-in, efficient compared to alternatives, and convenient to use. If Python dict had been originally implemented as a tree, I'm sure it would be just as popular. Omitting the factor of O(log n) as functionally equivalent to O(1) is applicable to many situations and is sometimes called "soft-O" notation. One example from practice is the pre-2011 C++, where the standardization committee failed to standardize hash tables on time for the 1998 standard. Although this was widely recognized as an oversight, a large number of programs simply used tree-based std::maps and never noticed a practical difference between between average-constant-time and logarithmic complexity lookups. -- http://mail.python.org/mailman/listinfo/python-list
Re: order independent hash?
Steven D'Aprano writes: > Except for people who needed dicts with tens of millions of items. Huge tree-based dicts would be somewhat slower than today's hash-based dicts, but they would be far from unusable. Trees are often used to organize large datasets for quick access. The case of dicts which require frequent access, such as those used to implement namespaces, is different, and more interesting. Those dicts are typically quite small, and for them the difference between O(log n) and O(1) is negligible in both theory (since n is "small", i.e. bounded) and practice. In fact, depending on the details of the implementation, the lookup in a small tree could even be marginally faster. -- http://mail.python.org/mailman/listinfo/python-list
Re: unzip function?
Neal Becker writes: > python has builtin zip, but not unzip > > A bit of googling found my answer for my decorate/sort/undecorate problem: > > a, b = zip (*sorted ((c,d) for c,d in zip (x,y))) > > That zip (*sorted... > > does the unzipping. > > But it's less than intuitively obvious. > > I'm thinking unzip should be a builtin function, to match zip. "zip" and "unzip" are one and the same since zip is inverse to itself: >>> [(1, 2, 3), (4, 5, 6)] [(1, 2, 3), (4, 5, 6)] >>> zip(*_) [(1, 4), (2, 5), (3, 6)] >>> zip(*_) [(1, 2, 3), (4, 5, 6)] >>> zip(*_) [(1, 4), (2, 5), (3, 6)] What you seem to call unzip is simply zip with a different signature, taking a single argument: >>> def unzip(x): ... return zip(*x) ... >>> [(1, 2, 3), (4, 5, 6)] [(1, 2, 3), (4, 5, 6)] >>> unzip(_) [(1, 4), (2, 5), (3, 6)] >>> unzip(_) [(1, 2, 3), (4, 5, 6)] >>> unzip(_) [(1, 4), (2, 5), (3, 6)] -- http://mail.python.org/mailman/listinfo/python-list
Re: while True or while 1
Dave Angel writes: > I do something similar when there's a portion of code that should > never be reached: > > assert("reason why I cannot get here") Shouldn't that be assert False, "reason why I cannot get here"? -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's "only one way to do it" philosophy isn't good?
Douglas Alan <[EMAIL PROTECTED]> writes: > I think you overstate your case. Lispers understand iteration > interfaces perfectly well, but tend to prefer mapping fuctions to > iteration because mapping functions are both easier to code (they > are basically equivalent to coding generators) and efficient (like > non-generator-implemented iterators). The downside is that they are > not quite as flexible as iterators (which can be hard to code) and > generators, which are slow. Why do you think generators are any slower than hand-coded iterators? Consider a trivial sequence iterator: $ python -m timeit -s 'l=[1] * 100 class foo(object): def __init__(self, l): self.l = l self.i = 0 def __iter__(self): return self def next(self): self.i += 1 try: return self.l[self.i - 1] except IndexError: raise StopIteration ' 'tuple(foo(l))' 1 loops, best of 3: 173 usec per loop The equivalent generator is not only easier to write, but also considerably faster: $ python -m timeit -s 'l=[1] * 100 def foo(l): i = 0 while 1: try: yield l[i] except IndexError: break i += 1 ' 'tuple(foo(l))' 1 loops, best of 3: 46 usec per loop -- http://mail.python.org/mailman/listinfo/python-list
Re: Looking for an interpreter that does not request internet access
James Alan Farrell <[EMAIL PROTECTED]> writes: > Hello, > I recently installed new anti-virus software and was surprised the > next time I brought up IDLE, that it was accessing the internet. > > I dislike software accessing the internet without telling me about it, > especially because of my slow dial up connection (there is no option > where I live), but also because I feel it unsafe. When I start up IDLE, I get this message: Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. It would seem to explain the alarm you're seeing. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's "only one way to do it" philosophy isn't good?
Douglas Alan <[EMAIL PROTECTED]> writes: >>> The downside is that they are not quite as flexible as iterators >>> (which can be hard to code) and generators, which are slow. > >> Why do you think generators are any slower than hand-coded iterators? > > Generators aren't slower than hand-coded iterators in *Python*, but > that's because Python is a slow language. But then it should be slow for both generators and iterators. > *Perhaps* there would be some opportunities for more optimization if > they had used a less general mechanism.) Or if the generators were built into the language and directly supported by the compiler. In some cases implementing a feature is *not* a simple case of writing a macro, even in Lisp. Generators may well be one such case. -- http://mail.python.org/mailman/listinfo/python-list
Re: Memory leak issue with complex data structure
Alan Franzoni <[EMAIL PROTECTED]> writes: > I have a serious "leak" issue; even though I clear all those sets > and I delete all the references I can have to the current namespace, > memory is not freed. Maybe the memory is freed (marked as available for further use by Python), just not released to the operating system.[1] To test against that, try to allocate more Python structures and see if they reuse the freed memory or if they allocate even more memory. Even better, run code like this: while 1: ... populate your data structures ... clear() If this causes Python to allocate more and more memory, it means you have a real leak. If not, it means that the GC is working fine, but it's not possible to release the memory to the OS. [1] Not giving freed memory back to the system is not (necessarily) a Python bug; the same thing happens in C and is a consequence of managed memory being assigned to the process as a contiguous block. -- http://mail.python.org/mailman/listinfo/python-list
Re: What is the most efficient way to test for False in a list?
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes: but what is your best way to test for for False in a list? [...] >>> status = all(list) >> Am I mistaken, or is this no identity test for False at all? > > You are mistaken. > all take an iterable and returns if each value of it is true. Testing for truth is not the same as an identity test for False. OP's message doesn't make it clear which one he's looking for. This illustrates the difference: >>> False in [3, 2, 1, 0, -1] True# no False here >>> all([3, 2, 1, 0, -1]) False # false value present, not necessarily False -- http://mail.python.org/mailman/listinfo/python-list
Re: Per thread data
Will McGugan <[EMAIL PROTECTED]> writes: > Is there a canonical way of storing per-thread data in Python? mydata = threading.local() mydata.x = 1 ... http://docs.python.org/lib/module-threading.html -- http://mail.python.org/mailman/listinfo/python-list
Re: What is the most efficient way to test for False in a list?
Paul McGuire <[EMAIL PROTECTED]> writes: >> >>> False in [3, 2, 1, 0, -1] >> >> True# no False here>>> all([3, 2, 1, 0, -1]) >> >> False # false value present, not necessarily False > > I think if you want identity testing, you'll need to code your own; I'm aware of that, I simply pointed out that "False in list" and any(list) are not equivalent and where the difference lies. any(map(lambda _ : _ is False,[3,2,1,0,-1])) Note that you can use itertools.imap to avoid the unnecessary intermediate list creation. Even better is to use a generator expression: >>> any(x is False for x in [3, 2, 1, 0, -1]) False >>> any(x is False for x in [3, 2, 1, 0, -1, False]) True -- http://mail.python.org/mailman/listinfo/python-list
Re: os.wait() losing child?
Nick Craig-Wood <[EMAIL PROTECTED]> writes: >> I think your polling way works; it seems there no other way around this >> problem other than polling or extending Popen class. > > I think polling is probably the right way of doing it... It requires the program to wake up every 0.1s to poll for freshly exited subprocesses. That doesn't consume excess CPU cycles, but it does prevent the kernel from swapping it out when there is nothing to do. Sleeping in os.wait allows the operating system to know exactly what the process is waiting for, and to move it out of the way until those conditions are met. (Pedants would also notice that polling introduces on average 0.1/2 seconds delay between the subprocess dying and the parent reaping it.) In general, a program that waits for something should do so in a single call to the OS. OP's usage of os.wait was exactly correct. Fortunately the problem can be worked around by hanging on to Popen instances until they are reaped. If all of them are kept referenced when os.wait is called, they will never end up in the _active list because the list is only populated in Popen.__del__. > Internally subprocess uses os.waitpid(pid) just waiting for its own > specific pids. IMHO this is the right way of doing it other than > os.wait() which waits for any pids. os.wait() can reap children > that you weren't expecting (say some library uses os.system())... system calls waitpid immediately after the fork. This can still be a problem for applications that call wait in a dedicated thread, but the program can always ignore the processes it doesn't know anything about. -- http://mail.python.org/mailman/listinfo/python-list
Re: os.wait() losing child?
Jason Zheng <[EMAIL PROTECTED]> writes: > greg wrote: >> Jason Zheng wrote: >>> Hate to reply to my own thread, but this is the working program >>> that can demonstrate what I posted earlier: >> I've figured out what's going on. The Popen class has a >> __del__ method which does a non-blocking wait of its own. >> So you need to keep the Popen instance for each subprocess >> alive until your wait call has cleaned it up. >> The following version seems to work okay. >> > It still doesn't work on my machine. I took a closer look at the Popen > class, and I think the problem is that the __init__ method always > calls a method _cleanup, which polls every existing Popen > instance. Actually, it's not that bad. _cleanup only polls the instances that are no longer referenced by user code, but still running. If you hang on to Popen instances, they won't be added to _active, and __init__ won't reap them (_active is only populated from Popen.__del__). This version is a trivial modification of your code to that effect. Does it work for you? #!/usr/bin/python import os from subprocess import Popen pids = {} counts = [0,0,0] for i in xrange(3): p = Popen('sleep 1', shell=True, cwd='/home', stdout=file(os.devnull,'w')) pids[p.pid] = p, i print "Starting child process %d (%d)" % (i,p.pid) while (True): pid, ignored = os.wait() try: p, i = pids[pid] except KeyError: # not one of ours continue del pids[pid] counts[i] += 1 #terminate if count>10 if (counts[i]==10): print "Child Process %d terminated." % i if reduce(lambda x,y: x and (y>=10), counts): break continue print "Child Process %d terminated, restarting" % i p = Popen('sleep 1', shell=True, cwd='/home', stdout=file(os.devnull,'w')) pids[p.pid] = p, i -- http://mail.python.org/mailman/listinfo/python-list
Re: os.wait() losing child?
Nick Craig-Wood <[EMAIL PROTECTED]> writes: >> This can still be a problem for applications that call wait in a >> dedicated thread, but the program can always ignore the processes >> it doesn't know anything about. > > Ignoring them isn't good enough because it means that the bit of > code which was waiting for that process to die with os.getpid() will > never get called, causing a deadlock in that bit of code. It won't deadlock, it will get an ECHILD or equivalent error because it's waiting for a PID that doesn't correspond to a running child process. I agree that this can be a problem if and when you use libraries that can call system. (In that case sleeping for SIGCHLD is probably a good solution.) > What is really required is a select() like interface to wait which > takes more than one pid. I don't think there is such a thing > though, so polling is your next best option. Except for the problems outlined in my previous message. And the fact that polling becomes very expensive (O(n) per check) once the number of processes becomes large. Unless one knows that a library can and does call system, wait is the preferred solution. -- http://mail.python.org/mailman/listinfo/python-list
Re: os.wait() losing child?
Jason Zheng <[EMAIL PROTECTED]> writes: > Hrvoje Niksic wrote: >>> greg wrote: >> Actually, it's not that bad. _cleanup only polls the instances that >> are no longer referenced by user code, but still running. If you hang >> on to Popen instances, they won't be added to _active, and __init__ >> won't reap them (_active is only populated from Popen.__del__). >> > > Perhaps that's the difference between Python 2.4 and 2.5. [...] > Nope it still doesn't work. I'm running python 2.4.4, tho. That explains it, then, and also why greg's code didn't work. You still have the option to try to run 2.5's subprocess.py under 2.4. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to create new files?
Robert Dailey <[EMAIL PROTECTED]> writes: > class filestream: > def __init__( self, filename ): > self.m_file = open( filename, "rwb" ) [...] > So far, I've found that unlike with the C++ version of fopen(), the > Python 'open()' call does not create the file for you when opened > using the mode 'w'. According to your code, you're not using 'w', you're using 'rwb'. In that respect Python's open behaves the same as C's fopen. > Also, you might notice that my "self.m_file.read()" function is wrong, > according to the python docs at least. read() takes the number of > bytes to read, however I was not able to find a C++ equivalent of > "sizeof()" in Python. If I wanted to read in a 1 byte, 2 byte, or 4 > byte value from data into python I have no idea how I would do this. Simply read as much data as you need. If you need to unpack external data into Python object and vice versa, look at the struct module (http://docs.python.org/lib/module-struct.html). -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about PyDict_SetItemString
lgx <[EMAIL PROTECTED]> writes: >From Google results, I find some source code write like that. But >some code write like below: > > obj = PyString_FromString("value"); > PyDict_SetItemString(pDict,"key",obj); > Py_DECREF(obj); > > So, which one is correct? The latter is correct. While PyDict_GetItemString returns a borrowed reference, PyDict_SetItemString doesn't steal the reference. This makes sense because adding to the dictionary can fail for various reasons (insufficient memory, invalid key, hash or comparison functions failing), and that allows you to write code like this: obj = ; int err = PyDict_SetItemString(dict, "key", obj); Py_DECREF(obj); if (err) return NULL; /* or whatever is appropriate in your case */ That won't leak regardless of whether PyDict_SetItemString succeeded, and will correctly propagate an error if it occurs. Please note that there is a new mailing list for Python/C API questions, see http://mail.python.org/mailman/listinfo/capi-sig . -- http://mail.python.org/mailman/listinfo/python-list
Re: os.wait() losing child?
Jason Zheng <[EMAIL PROTECTED]> writes: >>> Nope it still doesn't work. I'm running python 2.4.4, tho. >> That explains it, then, and also why greg's code didn't work. You >> still have the option to try to run 2.5's subprocess.py under 2.4. > Is it more convenient to just inherit the Popen class? You'd still need to change its behavior to not call _cleanup. For example, by removing "your" instances from subprocess._active before chaining up to Popen.__init__. > I'm concerned about portability of my code. It will be run on > multiple machines with mixed Python 2.4 and 2.5 environments. I don't think there is a really clean way to handle this. -- http://mail.python.org/mailman/listinfo/python-list
Re: Implementaion of random.shuffle
Steve Holden <[EMAIL PROTECTED]> writes: > So it would appear that the developers chose the Knuth algorithm > (with a slight variation) for *their* implementation. Now you have > to ask yourself whether your surmise is genuinely correct (in which > case the documentation may contain a bug) or whether the > documentation is indeed correct and you are in error. That is a good question. The random module uses the Mersenne twister, which has a repetition period of 2**19937. The number of n-sized permutations of a list with n elements is n!, while each shuffle requires n calls to the PRNG. This means that to be able to generate all permutations, the PRNG must have a period of at least n! * n. In the case of MT, it means that, regarding the period, you are safe for lists with around 2079 elements. shuffle's documentation may have been written before the random module was converted to use the MT. 2**19937 being a really huge number, it's impossible to exhaust the Mersenne twister by running it in sequence. However, there is also the question of the spread of the first shuffle. Ideally we'd want any shuffle, including the first one, to be able to produce any of the n! permutations. To achieve that, the initial state of the PRNG must be able to support at least n! different outcomes, which means that the PRNG must be seeded by at least log2(n!) bits of randomness from an outside source. For reference, Linux's /dev/random stops blocking when 64 bits of randomness are available from the entropy pool, which means that, in the worst case, shuffling more than 20 elements cannot represent all permutations in the first shuffle! -- http://mail.python.org/mailman/listinfo/python-list
Re: Accessing Python variables in an extension module
MD <[EMAIL PROTECTED]> writes: > 2) Is there anyway to find the type of the object in C using something > like a switch statement? I was looking for something like this >switch type(object) { > STRING: "This is a string object"; > break; > INTEGER: "This is an integer object"; > break; > BOOLEAN: "This is a boolean object"; > . > . > } Not switch, but the closest you'll get is: if (object->ob_type == PyString_Type) { ... string } else if (object->ob_type == PyInt_Type) { ... int } else if (object->ob_type == PyBool_Type) { ... bool } > I don't want to run all the C Py***_Check functions on the object. Py*_Check are not expensive if the object really is of the target type. They are necessary to support subtyping correctly. -- http://mail.python.org/mailman/listinfo/python-list
Re: Implementaion of random.shuffle
Steven D'Aprano <[EMAIL PROTECTED]> writes: > In the case of CPython, the current implementation uses the Mersenne > Twister, which has a huge period of 2**19937. However, 2081! is > larger than that number, which means that at best a list of 2081 > items or longer can't be perfectly shuffled (not every permutation > can be selected by the algorithm). Note that each shuffle requires n calls to the PRNG, not just one, which reduces the theoretically safe list size by 1. -- http://mail.python.org/mailman/listinfo/python-list
Re: Semantics of file.close()
"Evan Klitzke" <[EMAIL PROTECTED]> writes: > You should take a look at the man pages for close(2) and write(2) (not > fclose). Generally you will only get an error in C if you try to close > a file that isn't open. In Python you don't even have to worry about > that -- if you close a regular file object more than once no exception > will be thrown, _unless_ you are using os.close(), which mimics the C > behavior. If you are out of space, in C you will get an error returned > by the call to write (even if the data isn't actually flushed to disk > yet by the kernel). I'm pretty sure Python mimics this behavior, so an > exception would be called on the write, not on the close operation. But the writes are buffered, and close causes the buffer to be flushed. file.close can throw an exception just like fclose, but it will still ensure that the file is closed. > > How do I ensure that the close() methods in my finally clause do > > not throw an exception? In the general case, you can't. Preferably you'd want to make sure that both files are closed: try: f1 = file(...) try: f2 = file(...) ... do something with f1 and f2 ... finally: f2.close() finally: f1.close() Now file.close would be called on both files regardless of where an exception occurs. If you use Python 2.5, this would be a good use case for the "nested" function from the contextlib module, which allow you to write the above more elegantly: from __future__ import with_statement from contextlib import nested with nested(file(...), file(...)) as (f1, f2): ... do something with f1 and f2 ... Finally, most of this applies to files open for writing, where Python is forced to flush the cache on close. If the file is opened for reading, you can assume that the exception will not be raised (and you can safely ignore it with `try: f.close() except IOError: pass' if you want to be sure; after all, you can't lose data when closing a file open for reading). -- http://mail.python.org/mailman/listinfo/python-list
Re: Semantics of file.close()
"Evan Klitzke" <[EMAIL PROTECTED]> writes: >> But the writes are buffered, and close causes the buffer to be >> flushed. file.close can throw an exception just like fclose, but >> it will still ensure that the file is closed. > > Is this buffering being done by Python or the kernel? It is done in the user space, by the C stdio library which Python currently uses for IO. -- http://mail.python.org/mailman/listinfo/python-list
Re: Interpreting os.lstat()
Adrian Petrescu <[EMAIL PROTECTED]> writes: > I checked the online Python documentation at > http://python.org/doc/1.5.2/lib/module-stat.html > but it just says to "consult the documentation for your system.". The page you're looking for is at http://www.python.org/doc/current/lib/os-file-dir.html . For lstat it says "Like stat(), but do not follow symbolic links." For stat it says: Perform a stat() system call on the given path. The return value is an object whose attributes correspond to the members of the stat structure, namely: st_mode (protection bits), st_ino (inode number), st_dev (device), st_nlink (number of hard links), st_uid (user ID of owner), st_gid (group ID of owner), st_size (size of file, in bytes), st_atime (time of most recent access), st_mtime (time of most recent content modification), st_ctime (platform dependent; time of most recent metadata change on Unix, or the time of creation on Windows) [...] For backward compatibility, the return value of stat() is also accessible as a tuple of at least 10 integers giving the most important (and portable) members of the stat structure, in the order st_mode, st_ino, st_dev, st_nlink, st_uid, st_gid, st_size, st_atime, st_mtime, st_ctime. More items may be added at the end by some implementations. The standard module stat defines functions and constants that are useful for extracting information from a stat structure. (On Windows, some items are filled with dummy values.) -- http://mail.python.org/mailman/listinfo/python-list
Re: class C: vs class C(object):
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > In particular, old-style classes are noticeably faster than > new-style classes for some things (I think it was attribute lookup > that surprised me recently, possibly related to the property > stuff...) Can you post an example that we can benchmark? I ask because the opposite is usually claimed, that (as of Python 2.4 or 2.5) new-style classes are measurably faster. -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess (spawned by os.system) inherits open TCP/UDP/IP port
alf <[EMAIL PROTECTED]> writes: > still would like to find out why it is happening (now FD_CLOEXEC > narrowed may yahooing/googling searches). While realize that file > descriptors are shared by forked processes it is still weird why the > port moves to the child process once parent gets killed. what it the > parent got multiple subprocesses. Netstat probably shows only one of the processes that hold to the port, possibly the one with the lowest PID (the parent). > Plus it is kind of unintuitive os.system does not protect from such > behavoir which is for me more an equivalent of like issuing a ne > wcommand/ starting a process from the shell. It is considered a feature that fork/exec'ed programs inherit file descriptors -- that's how stdin and stdout get inherited all the time. It doesn't occur often with network connections because shells rarely have reason to open them. -- http://mail.python.org/mailman/listinfo/python-list
Multiple regex match idiom
I often have the need to match multiple regexes against a single string, typically a line of input, like this: if (matchobj = re1.match(line)): ... re1 matched; do something with matchobj ... elif (matchobj = re2.match(line)): ... re2 matched; do something with matchobj ... elif (matchobj = re3.match(line)): Of course, that doesn't work as written because Python's assignments are statements rather than expressions. The obvious rewrite results in deeply nested if's: matchobj = re1.match(line) if matchobj: ... re1 matched; do something with matchobj ... else: matchobj = re2.match(line) if matchobj: ... re2 matched; do something with matchobj ... else: matchobj = re3.match(line) if matchobj: ... Normally I have nothing against nested ifs, but in this case the deep nesting unnecessarily complicates the code without providing additional value -- the logic is still exactly equivalent to the if/elif/elif/... shown above. There are ways to work around the problem, for example by writing a utility predicate that passes the match object as a side effect, but that feels somewhat non-standard. I'd like to know if there is a Python idiom that I'm missing. What would be the Pythonic way to write the above code? -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading a file and resuming reading.
"Karim Ali" <[EMAIL PROTECTED]> writes: > - > while not eof <- really want the EOF and not just an empty line! > readline by line > end while; > - for line in open_file: ... It will stop on EOF, not on empty line. > But also, in case for one reason or another the program crashes, I > want to be able to rexecute it and for it to resume reading from the > same position as it left. If a while loop like the one above can be > implemented I can do this simply by counting the lines! If you open the file in binary mode, you can easily keep track of the position in file: bytepos = 0 with file(filename) as f: for line in f: ... process line ... bytepos += len(line) If you need to restart the operation, simply seek to the previously known position: # restart with old bytyepos with file(filename) as f: f.seek(bytepos) for line in f: ... process line ... bytepos += len(line) -- http://mail.python.org/mailman/listinfo/python-list
Re: Puzzled by "is"
Grzegorz Słodkowicz <[EMAIL PROTECTED]> writes: >> Seriously, it's just an optimization by the implementers. There is >> no need for more than one empty tuple, since tuples can never be >> modified once created. >> >> But they decided not to create (1, ) in advance. They probably knew >> that hardly anybody would want to create that tuple ;-) [Seriously: >> if you started trying to predict which tuples would be used you >> would go insane, but the empty tuple is the most likely candidate]. >> > That's just theorisation but I'd rather expect the interpreter simply > not to create a second tuple while there already is an identical > one. But then tuple creation would be slowed down by searching for whether an "identical one" already exists. In the general case, that is quite unlikely, so it's not done. (I suspect that only the requirement to store the list of all tuples somewhere would outweigh any potential gains of this strategy; and if the search were implemented as a hash table lookup, even more space would be wasted.) It's done for the empty tuple because no search is necessary, only a size test. > Admittedly the empty tuple is a special case but then 'Special cases > aren't special enough to break the rules'. Except no rule is being broken. As others have pointed out, since tuples are immutable, caching them is quite safe. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fatest standard way to sum bytes (and their squares)?
Erik Max Francis <[EMAIL PROTECTED]> writes: > So far the fastest way I've found is using the `sum` builtin and > generators:: > > ordinalSum = sum(ord(x) for x in data) > ordinalSumSquared = sum(ord(x)**2 for x in data) For ordinalSum, using imap is almost twice as fast: $ python -m timeit -s 'data=[chr(x) for x in xrange(256)]' 'sum(ord(x) for x in data)' 1 loops, best of 3: 92.4 usec per loop $ python -m timeit -s 'data=[chr(x) for x in xrange(256)]; from itertools import imap' 'sum(imap(ord, data))' 1 loops, best of 3: 55.4 usec per loop Of course, that optimization doesn't work for the squared sum; using a lambda only pessimizes it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fatest standard way to sum bytes (and their squares)?
Erik Max Francis <[EMAIL PROTECTED]> writes: > Hrvoje Niksic wrote: > >> For ordinalSum, using imap is almost twice as fast: >> $ python -m timeit -s 'data=[chr(x) for x in xrange(256)]' >> 'sum(ord(x) for x in data)' >> 1 loops, best of 3: 92.4 usec per loop >> $ python -m timeit -s 'data=[chr(x) for x in xrange(256)]; from itertools >> import imap' 'sum(imap(ord, data))' >> 1 loops, best of 3: 55.4 usec per loop > > You're using data which is a list of chars (strings), rather than a > string itself, which is what the format is in. The imap > optimization doesn't appear to work quite as dramatically well for > me with strings instead of lists, but it certainly is an > improvement. I wouldn't expect to see any difference in strings and lists. In this simple test I get approximately the same ~1.7x speedup: $ python -m timeit 'sum(ord(x) for x in "abcdefghijklmnopqrstuvwxyz")' 10 loops, best of 3: 12.7 usec per loop $ python -m timeit -s 'from itertools import imap' 'sum(imap(ord, "abcdefghijklmnopqrstuvwxyz"))' 10 loops, best of 3: 7.42 usec per loop -- http://mail.python.org/mailman/listinfo/python-list
Re: File Read Cache - How to purge?
Signal <[EMAIL PROTECTED]> writes: > 2. Is there anyway to somehow to take advantage of this "caching" by > initializing it without reading through the entire file first? > > 3. If the answer to #2 is No, then is there a way to purge this > "cache" in order to get a more accurate result in my routine? That > is without having to read another large file first? On a Unix system the standard way to purge the cache is to unmount the file system and remount it. If you can't do that on Windows, you can get the same effect by placing the test files on an external (USB) hard drive; unplugging the drive and plugging it back again will almost certainly force the OS to flush any associated caches. Having to do that is annoying, even as a last resort, but still better than nothing. -- http://mail.python.org/mailman/listinfo/python-list
Re: python 2.5.1 segfault, multithreading & dual core issue?
Paul Sijben <[EMAIL PROTECTED]> writes: > I am running a multi-threaded python application in a dual core > intel running Ubuntu. [...] Judging from the stack trace, this patch has a good chance of fixing your problem: http://mail.python.org/pipermail/python-dev/2007-August/074232.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How to optimise this code?
Christof Winter <[EMAIL PROTECTED]> writes: > To get rid of the if statements, replace __init__ function with: > > def __init__(self, tc): > functionToCall = eval("self.testCase%s" % tc) Or functionToCall = getattr(self, "testCase" + tc) eval can introduce unwanted side effects. -- http://mail.python.org/mailman/listinfo/python-list
Re: File Read Cache - How to purge?
Nick Craig-Wood <[EMAIL PROTECTED]> writes: > If you are running linux > 2.6.18 then you can use > /proc/sys/vm/drop_caches for exactly that purpose. > > http://www.linuxinsight.com/proc_sys_vm_drop_caches.html That URL claims that you need to run "sync" before dropping the cache, and so do other resources. I wonder if that means that dropping the cache is unsafe on a running system. -- http://mail.python.org/mailman/listinfo/python-list
Re: File Read Cache - How to purge?
Steve Holden <[EMAIL PROTECTED]> writes: >> That URL claims that you need to run "sync" before dropping the >> cache, and so do other resources. I wonder if that means that >> dropping the cache is unsafe on a running system. > > Good grief. Just let the operating system do its job, for Pete's > sake, and go find something else to obsess about. Purging the page cache for the purposes of benchmarking (such as measuring cold start time of large applications) is an FAQ, not an "obsession". No one is arguing that the OS shouldn't do its job in the general case. -- http://mail.python.org/mailman/listinfo/python-list
Re: File Read Cache - How to purge?
Nick Craig-Wood <[EMAIL PROTECTED]> writes: >> > http://www.linuxinsight.com/proc_sys_vm_drop_caches.html >> >> That URL claims that you need to run "sync" before dropping the cache, >> and so do other resources. I wonder if that means that dropping the >> cache is unsafe on a running system. > > It isn't unsafe, the OS just can't drop pages which haven't been > synced to disk so you won't get all the pages dropped unless you > sync first. Thanks for the clarification. -- http://mail.python.org/mailman/listinfo/python-list
Re: Does shuffle() produce uniform result ?
tooru honda <[EMAIL PROTECTED]> writes: > I have read the source code of the built-in random module, > random.py. After also reading Wiki article on Knuth Shuffle > algorithm, I wonder if the shuffle method implemented in random.py > produces results with modulo bias. It doesn't have modulo bias because it doesn't use modulo to produce a random index; it multiplies the floating point value with the desired range. I'm not sure if that method produces any measurable bias. -- http://mail.python.org/mailman/listinfo/python-list
Re: Registering a python function in C
fernando <[EMAIL PROTECTED]> writes: > Could someone post an example on how to register a python function as > a callback in a C function? If I understand correctly, your C function receives a Python function (as a function object of type PyObject *), which you need to call from C. To do that, call PyObject_CallFunction(obj, format, args...) where format and args are documented in http://docs.python.org/api/arg-parsing.html. Does that help? Also note that there is a dedicated mailing list for the Python/C API; see http://mail.python.org/mailman/listinfo/capi-sig . -- http://mail.python.org/mailman/listinfo/python-list
Re: fcntl problems
"mhearne808[insert-at-sign-here]gmail[insert-dot-here]com" <[EMAIL PROTECTED]> writes: > I think I'm still confused. What Miles tried to tell you is that you should call fcnt.flock from both PA and PB. In the example you posted, you failed to call it from PB. No lock call, so no locking happened. > I have a script that will be run from a cron job once a minute. One > of the things this script will do is open a file to stash some > temporary results. I expect that this script will always finish its > work in less than 15 seconds, but I didn't want to depend on that. > > Thus I started to look into file locking, which I had hoped I could > use in the following fashion: > > Process A opens file foo > Process A locks file foo > Process A takes more than a minute to do its work > Process B wakes up > Process B determines that file foo is locked > Process B quits in disgust > Process A finishes its work File locking supports that scenario, as you suspected. You need to use flock with LOCK_EX|LOCK_NB. If the call succeeds, you got the lock. If you get an exception whose errno is EWOULDBLOCK, you quit in disgust. -- http://mail.python.org/mailman/listinfo/python-list
Re: Printing lists in columns
[EMAIL PROTECTED] writes: >> for row in izip_longest(*d, fillvalue='*'): >> print ', '.join(row) >> >> HTH > > I thought that but when I tried it I recieved a > "Syntax Error: Invalid Syntax" > with a ^ pointing to fillvalue :S Python isn't too happy about adding individual keyword arguments after an explicit argument tuple. Try this instead: for row in izip_longest(*d, **dict(fillvalue='*')): print ', '.join(row) -- http://mail.python.org/mailman/listinfo/python-list
Re: creating really big lists
Dr Mephesto <[EMAIL PROTECTED]> writes: > I would like to create a pretty big list of lists; a list 3,000,000 > long, each entry containing 5 empty lists. My application will > append data each of the 5 sublists, so they will be of varying > lengths (so no arrays!). > > Does anyone know the most efficient way to do this? I have tried: > > list = [[[],[],[],[],[]] for _ in xrange(300)] You might want to use a tuple as the container for the lower-level lists -- it's more compact and costs less allocation-wise. But the real problem is not list allocation vs tuple allocation, nor is it looping in Python; surprisingly, it's the GC. Notice this: $ python Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> t0=time.time(); l=[([],[],[],[],[]) for _ in xrange(300)]; >>> t1=time.time() >>> t1-t0 143.89971613883972 Now, with the GC disabled: $ python Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import gc >>> gc.disable() >>> import time >>> t0=time.time(); l=[([],[],[],[],[]) for _ in xrange(300)]; >>> t1=time.time() >>> t1-t0 2.9048631191253662 The speed difference is staggering, almost 50-fold. I suspect GC degrades the (amortized) linear-time list building into quadratic time. Since you allocate all the small lists, the GC gets invoked every 700 or so allocations, and has to visit more and more objects in each pass. I'm not sure if this can be fixed (shouldn't the generational GC only have to visit the freshly created objects rather than all of them?), but it has been noticed on this group before. If you're building large data structures and don't need to reclaim cyclical references, I suggest turning GC off, at least during construction. -- http://mail.python.org/mailman/listinfo/python-list
Re: creating really big lists
Dr Mephesto <[EMAIL PROTECTED]> writes: > I need some real speed! Is the speed with the GC turned off sufficient for your usage? -- http://mail.python.org/mailman/listinfo/python-list
Re: Autogenerate functions (array of lambdas)
Chris Johnson <[EMAIL PROTECTED]> writes: > What I want to do is build an array of lambda functions, like so: > > a = [lambda: i for i in range(10)] Use a factory function for creating the lambdas. The explicit function call will force a new variable binding to be created each time, and the lambda will refer to that binding rather than to the loop variable binding, which is reused for all loop iterations. For example: def makefn(i): return lambda: i >>> a = [makefn(i) for i in xrange(10)] >>> [f() for f in a] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] The alternative is to explicitly import the value into the lambda's parameter list, as explained by others. -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating a unique identifier
Steven D'Aprano <[EMAIL PROTECTED]> writes: > Should garbage-collecting 16 million strings really take 20+ > minutes? It shouldn't. For testing purposes I've created a set of 16 milion strings like this: s = set() for n in xrange(1600): s.add('somerandomprefix' + str(n)) # prefix makes the strings a bit larger It takes maybe about 20 seconds to create the set. Quitting Python takes 4-5 seconds. This is stock Python 2.5.1. -- http://mail.python.org/mailman/listinfo/python-list
Re: MemoryError on reading mbox file
Christoph Krammer <[EMAIL PROTECTED]> writes: > I have to convert a huge mbox file (~1.5G) to MySQL. Have you tried commenting out the MySQL portion of the code? Does the code then manage to finish processing the mailbox? -- http://mail.python.org/mailman/listinfo/python-list
Re: Dynamically removing methods in new-style classes
[EMAIL PROTECTED] writes: > I am trying unsuccessfully to remove some methods from an instance, You can't remove the method from an instance because the method is stored in its class. > With the older python classes I could have done: > self.__class__.__dict__[''test1"] to achieve the desired result. self.__class__.test1 still works, doesn't it? Removing methods can be achieved the same way: >>> x=X() >>> class X(object): ... def blah(self): pass ... >>> x=X() >>> x.blah > >>> del type(x).blah >>> x.blah Traceback (most recent call last): File "", line 1, in AttributeError: 'X' object has no attribute 'blah' -- http://mail.python.org/mailman/listinfo/python-list
Re: class that keeps track of instances
<[EMAIL PROTECTED]> writes: > 1) New instance has to have a property called 'name' > 2) When instance is attemped to created, e.g., x=kls(name='myname'), and > there already exists an instance with obj.name =='myname', that > pre-existing instance is returned, instead of making new one. > 3) A class property 'all' for class gives me a list of all the > instances. So kls.all lets me iterates through all instances. > 4) When all the hard-link to an instance is deleted, the instance should > be deleted, just like an instance from any regular class does. class Meta(type): all = property(lambda type: type.cache.values()) class kls(object): __metaclass__ = Meta cache = weakref.WeakValueDictionary() def __new__(cls, name): if name in kls.cache: return kls.cache[name] self = object.__new__(cls) self.name = name kls.cache[name] = self return self >>> x = kls(name='foo') >>> x <__main__.kls object at 0xb7d5dc8c> >>> x is kls(name='foo') True >>> x is kls(name='bar') False >>> print kls.all# only one instance, 'bar' was short-lived [<__main__.kls object at 0xb7d5dc8c>] >>> x = 'somethingelse' >>> print kls.all [] > Assuming that I have to write it on my own, what should I do? I > tried to implement it using weakref.WeakValueDictionary and > metaclass, but instance doesn't disappear when I think it should > disappear. I am also wondering if it is easier to keeping > {name:id(obj)} would be a better solution. The problem is that, given just an ID, you have no way to get a hold of the actual object. -- http://mail.python.org/mailman/listinfo/python-list
Re: super() doesn't get superclass
Bruno Desthuilliers <[EMAIL PROTECTED]> writes: > If a class X is in the MRO of call Y, then X is a superclass of Y. I > agree that the documentation for super is somewhat misleading (and > obviously wrong), but it still *give access to* (at least one of) > the superclass(es). I believe the confusion comes from different assumptions about what "superclasses" refers to. super() iterates over superclasses of the *instance* in use, but an individual call to super does not necessarily invoke the superclass of the *implementation* of the method. For example, given a random class: class X(Y): def foo(self): super(X, self).foo() ...there is in fact no guarantee that super() calls a superclass of X. However, it is certainly guaranteed that it will call a superclass of type(self). Pre-2.2 Python used a simpler scheme where the superclass was always called, but it caused problems with diamond inheritance where some methods would be called either twice or not at all. (This is explained in http://www.python.org/download/releases/2.2.3/descrintro/ in some detail.) -- http://mail.python.org/mailman/listinfo/python-list
Re: super() doesn't get superclass
Ben Finney <[EMAIL PROTECTED]> writes: > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > >> class X(Y): >> def foo(self): >> super(X, self).foo() >> >> ...there is in fact no guarantee that super() calls a superclass of >> X. However, it is certainly guaranteed that it will call a superclass >> of type(self). > > Not even that. It could call *any class in the inheritance > hierarchy*, The inheritance hierarchiy is populated by the various (direct and indirect) superclasses of type(self). > depending on how the MRO has resolved "next class". Even one that is > neither an ancestor nor a descendant of X. My point exactly. superclass of X is not the same as superclass of type(self). Super iterates over the latter, where you expect the former. -- http://mail.python.org/mailman/listinfo/python-list
Re: super() doesn't get superclass
Ben Finney <[EMAIL PROTECTED]> writes: > Evan is claiming that "the next class in the MRO _is_ a superclass", > apparently by his definition or some other that I've not seen. The definition of superclass is not the issue, the issue is "superclass *of which class*"? You expect super(A, self) to iterate only over superclasses of A, even when self is an instance of a subtype of A. What really happens is that super(A, self) yields the next method in type(self)'s MRO, which can and does cause include classes that are not by any definition superclasses of A. All of those classes are, however, superclasses of the instance's type. I think it is not possible to have super(A, self) only call superclasses of A and at the same time having multiple inheritance work without calling some methods in the hierarchy twice or not at all. Guido's paper at http://tinyurl.com/qkjgp explains the reasoning behind super in some detail. >> I agree that the documentation for super is somewhat misleading (and >> obviously wrong), > > Well, that's the first time someone has acknowledged that in this > thread, so I guess this is something. For the record, I also agree with that. The documentation should document in some detail that super(type, obj) yields superclasses of type(obj), not of type, and that the "type" argument is only used for super to be able to locate the next type in the list. >> I wouldn't use such an extreme word as 'madness', but I totally agree >> that this should be corrected. Care to submit a doc patch ? > > I don't understand what practical uses 'super' is intended for It's intended for cooperative multiple inheritance, a la CLOS's call-next-method. -- http://mail.python.org/mailman/listinfo/python-list
Re: super() doesn't get superclass
Michele Simionato <[EMAIL PROTECTED]> writes: > On Sep 19, 12:36 pm, Bruno Desthuilliers [EMAIL PROTECTED]> wrote: > >> The next class in the MRO *is* a superclass of the *instance*. Else it >> wouldn't be in the MRO !-) > > Bruno, there is no such a thing as a superclass in a multiple > inheritance world, and it is a very bad idea to continue to use that > terminology. Your arguments against the superclass term seem to assume that there is only a single superclass to a particular class. In the example you give in your essay, I would say that all of A, B, and T are superclasses of C, and Python's super correctly iterates over all of them. Wikipedia defines superclass as a "class from which other classes are derived", which seems perfectly valid for MI. -- http://mail.python.org/mailman/listinfo/python-list
Re: super() doesn't get superclass
Michele Simionato <[EMAIL PROTECTED]> writes: > On Sep 19, 1:16 pm, Hrvoje Niksic <[EMAIL PROTECTED]> wrote: >> Your arguments against the superclass term seem to assume that there >> is only a single superclass to a particular class. > > If you say "the" superclass, then you also assume it is unique. FWIW, Bruno said "a", at least in the section you quoted. > But the big issue is that the order of the methods depends on the > second argument to super, the instance, so there is no useful > concept of the superclass of the first argument of super. No argument here. -- http://mail.python.org/mailman/listinfo/python-list
Re: super() doesn't get superclass
Ben Finney <[EMAIL PROTECTED]> writes: >> The definition of superclass is not the issue, the issue is >> "superclass *of which class*"? You expect super(A, self) to iterate >> only over superclasses of A, even when self is an instance of a >> subtype of A. > > Yes. Those are the specific parameters to the function call, so that > *is* what I expect. The specific parameters are a type and an instance. Those same parameters can and do allow for an implementation that accesses supertypes of type(self). That is in fact more logical; otherwise one could simply iterate over A.__bases__ and we wouldn't need an elaborate 'super' construct. Not iterating only over A's superclasses is the entire *point* of super. The only deficiency of super I see in this thread is incomplete documentation. -- http://mail.python.org/mailman/listinfo/python-list
Re: __contains__() : Bug or Feature ???
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > I need to overload the operator in and let him return an object > ... It seems it is not a behavior Python expect : Python expects it all right, but it intentionally converts the value to a boolean. The 'in' operator calls PySequence_Contains, which returns a boolean value at the C level. User-supplied __contains__ is implemented as an adaptor in typeobject.c (slot_sq_contains). It takes the value returned by your __contains__ implementation and converts it to 0 or 1. I don't think you can overload 'in' as you want without pervasive changes to CPython source code. -- http://mail.python.org/mailman/listinfo/python-list
Re: __contains__() and overload of in : Bug or Feature ???
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: >> The string "yop" evaluates to the boolean value True, as it is not >> empty. > > Does it means that when overloading an operator, python just > wrap the call to the method and keep control of the returned > values ??? In case of 'in' operator, it does. -- http://mail.python.org/mailman/listinfo/python-list
Re: Calling constructor but not initializer
Steven D'Aprano <[EMAIL PROTECTED]> writes: > I can construct an empty instance in the __new__ constructor, and I > can initialize an non-empty instance in the __init__ initializer, > but I can't think of any good way to stop __init__ from being called > if the instance is empty. In pseudo-code, I want to do something > like this: > > class Parrot(object): > def __new__(cls, data): > construct a new empty instance > if data is None: > return that empty instance > else: > call __init__ on the instance to populate it > return the non-empty instance Suggestion 1: since you "construct a new empty instance" in both cases, simply move the entire logic to __init__. Suggestion 2: name your initialization method something other than __init__ and the calling-type-object-automatically-calls-__init__-after-__new__ simply disappears. Can you specify the way you'd like to instantiate the class? -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting a list numbers stored as strings
"Delaney, Timothy (Tim)" <[EMAIL PROTECTED]> writes: > Yep - appears I must have been misremembering from another language > (dunno which) Tcl -- http://mail.python.org/mailman/listinfo/python-list
Re: sorteddict PEP proposal [started off as orderedict]
Steven Bethard <[EMAIL PROTECTED]> writes: > With this is the implementation, I'm definitely -1. Not because it's a > bad implementation, but because if the iteration is always doing a > sort, then there's no reason for a separate data structure. Agreed. A true sorted dict would keep its keys sorted in the first place, a la C++ std::map. -- http://mail.python.org/mailman/listinfo/python-list
Re: sorteddict PEP proposal [started off as orderedict]
Duncan Booth <[EMAIL PROTECTED]> writes: > I that's the point though: you can't write one implementation that has good > performance for all patterns of use An implementation of sorted dict using a balanced tree as the underlying data structure would give decent performance in all the mentioned use cases. For example, red-black trees search, insert, and delete in O(log n) time. -- http://mail.python.org/mailman/listinfo/python-list
Re: the address of list.append and list.append.__doc__
HYRY <[EMAIL PROTECTED]> writes: > This works, but I think the key of DOC is too long, so I want to use > the id of list.append.__doc__ as the key; or use the id of > list.append: Using the id is not a good idea because id's are not permanent. Using list.append as the hash key will work and will internally use the pointer to produce the hash key, which is probably what you want anyway. > So, I asked how to get list.append from a.append >>> def unbound(meth): ... return getattr(type(meth.__self__), meth.__name__) ... >>> unbound(a.append) > and why id(list.append.__doc__) changes. Because the doc for builtins is internally kept in a read-only C string for efficiency. The Python string is built only when actually used. -- http://mail.python.org/mailman/listinfo/python-list
Re: sorteddict PEP proposal [started off as orderedict]
Mark Summerfield <[EMAIL PROTECTED]> writes: > On 26 Sep, 09:51, Hrvoje Niksic <[EMAIL PROTECTED]> wrote: >> Duncan Booth <[EMAIL PROTECTED]> writes: >> > I that's the point though: you can't write one implementation >> > that has good performance for all patterns of use >> >> An implementation of sorted dict using a balanced tree as the >> underlying data structure would give decent performance in all the >> mentioned use cases. For example, red-black trees search, insert, and >> delete in O(log n) time. > > Basically, as implemented, I have to invalidate if there is any > change [...] No argument here, as long as the limitation is understood to be a consequence of the current implementation model. Seriously proposing a sorteddict that is a mere syntactic sugar over dict dooms the PEP to rejection. Major programming language libraries have included sorted mapping and set types for a while now, making the performance and complexity constraints generally well understood. We should make use of that knowledge when designing sorteddict. -- http://mail.python.org/mailman/listinfo/python-list
Re: sorteddict PEP proposal [started off as orderedict]
Paul Hankin <[EMAIL PROTECTED]> writes: >> An implementation of sorted dict using a balanced tree as the >> underlying data structure would give decent performance in all the >> mentioned use cases. For example, red-black trees search, insert, >> and delete in O(log n) time. > > But dicts do search, insert and delete in O(1) time, so using some > variety of balanced tree will give you much worse performance when > you're doing regular dict operations. I wouldn't call it "much worse"; while O(log(n)) is worse than O(1), it's still very fast, which is why popular programming language libraries have an ordered mapping type based on balanced trees. Also note that dict performance can degrade with hash collisions, while trees can maintain complexity guarantees on all operations. In the end, it's a tradeoff. Hash tables offer O(1) access, but lack ordering. Balanced trees offer ordering at the price of O(log n) access. Both have their uses, but neither is syntactic sugar for the other. -- http://mail.python.org/mailman/listinfo/python-list
Re: ~ bit-wise unary operator
Ladislav Andel <[EMAIL PROTECTED]> writes: > Hello, why ~ bit-wise unary operator returns -(x+1) and not bit > inversion of the given integer? On 2s-complement architectures, -(x+1) *is* bit inversion of the given integer. > example: > a = 7978 > a = ~a > python returns -7979 > > but I need to get back 57557 as in C language. Python does exactly what C does in this case. $ cat a.c #include int main(void) { int a = 7978; a = ~a; printf("%d\n", a); return 0; } $ gcc a.c $ ./a.out -7979 If you want 16-bit unsigned arithmetic, use 2**16 + ~a, which yields 57557. -- http://mail.python.org/mailman/listinfo/python-list
Re: PyObject_CallObject: difference between functions and class methods
[ Note that there is now a mailing list dedicated to the C API: http://mail.python.org/mailman/listinfo/capi-sig ] mauro <[EMAIL PROTECTED]> writes: > I am trying to call within a C extension a Python function provided as > an argument by the user with: PyObject_Call(). The C extension should > work also if the user supplies a class method, but in this case I am > getting an error. Do I need to explicitly pass 'self' as an argument > to PyObject_Call()? You don't. The reference to self will be added automatically when invoking the function you receive as object.method. > if ((tmp_args = PyTuple_New(1)) == NULL) > PyErr_SetString( PyExc_ReferenceError, "attempt to access a > null- > pointer" ); > PyTuple_SetItem(tmp_args, 0, paramlist); Maybe you are mismanaging the reference count -- PyTuple_SetItem steals the refcount of its argument. Anyway, why not use PyObject_CallFunction or PyObject_CallFunctionObjArgs? For example: PyObject * mymodule_main(PyObject *ignored, PyObject *func) { PyObject *result, *my_param; /* ... do something, e.g. create my_param ... */ /* call func */ result = PyObject_CallFunction(received_func, "O", my_param); Py_DECREF(my_param); /* assuming you no longer need it */ if (!result) return NULL; /* ... do something with result ... */ Py_DECREF(result); Py_INCREF(Py_None); return Py_None; /* or whatever */ } -- http://mail.python.org/mailman/listinfo/python-list
Re: Cross-platform time out decorator
Joel <[EMAIL PROTECTED]> writes: > I found the solution : > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440569 > describes a solution based on threads. I tested it and it works > perfectly. Note that, unlike the original alarm code, it doesn't really interrupt the timed-out method, it just returns the control back to the caller, using an exception to mark that a timeout occurred. The "timed out" code is still merrily running in the background. I don't know if it's a problem in your case, but it's an important drawback. -- http://mail.python.org/mailman/listinfo/python-list
Re: Cross-platform time out decorator
Joel <[EMAIL PROTECTED]> writes: >> Note that, unlike the original alarm code, it doesn't really interrupt >> the timed-out method, it just returns the control back to the caller, >> using an exception to mark that a timeout occurred. The "timed out" >> code is still merrily running in the background. I don't know if it's >> a problem in your case, but it's an important drawback. > > There should be a method to stop the thread though? Not in Python. Thread killing primitives differ between systems and are unsafe in general, so they're not exposed to the interpreter. On Windows you can attempt to use ctypes to get to TerminateThread, but you'll need to hack at an uncomfortably low level and be prepared to deal with the consequences, such as memory leaks. If the timeouts happen rarely and the code isn't under your control (so you have no recourse but to terminate the thread), it might be worth it though. -- http://mail.python.org/mailman/listinfo/python-list
Re: Cross-platform time out decorator
"Chris Mellon" <[EMAIL PROTECTED]> writes: > You can use ctypes and the Python API to raise a Python exception in > the thread. How, by changing the thread's exception state? -- http://mail.python.org/mailman/listinfo/python-list
Re: Can I overload the compare (cmp()) function for a Lists ([]) index function?
xkenneth <[EMAIL PROTECTED]> writes: > Looking to do something similair. I'm working with alot of timestamps > and if they're within a couple seconds I need them to be indexed and > removed from a list. > Is there any possible way to index with a custom cmp() function? > > I assume it would be something like... > > list.index(something,mycmp) The obvious option is reimplementing the functionality of index as an explicit loop, such as: def myindex(lst, something, mycmp): for i, el in enumerate(lst): if mycmp(el, something) == 0: return i raise ValueError("element not in list") Looping in Python is slower than looping in C, but since you're calling a Python function per element anyway, the loop overhead might be negligible. A more imaginative way is to take advantage of the fact that index uses the '==' operator to look for the item. You can create an object whose == operator calls your comparison function and use that object as the argument to list.index: class Cmp(object): def __init__(self, item, cmpfun): self.item = item self.cmpfun = cmpfun def __eq__(self, other): return self.cmpfun(self.item, other) == 0 # list.index(Cmp(something, mycmp)) For example: >>> def mycmp(s1, s2): ... return cmp(s1.tolower(), s2.tolower()) >>> ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp)) 1 >>> ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp)) 1 >>> ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp)) Traceback (most recent call last): File "", line 1, in ValueError: list.index(x): x not in list The timeit module shows, somewhat surprisingly, that the first method is ~1.5 times faster, even for larger lists. -- http://mail.python.org/mailman/listinfo/python-list
Re: Limits on search length
Daryl Lee <[EMAIL PROTECTED]> writes: > I am trying to locate all lines in a suite of files with quoted > strings of particular lengths. A search pattern like r'".{15}"' > finds 15-character strings very nicely. But I have some very long > ones, and a pattern like r'".{272}"' fails miserably, even though I > know I have at least one 272-character string. It seems to work for me. Which version of Python are you using? Here is how I tested it. First, I modified your program so that it actually runs (sys and re imports were missing) and removed unnecessary globbing and file opening: import sys, re searchPattern = sys.argv[1] cpat = re.compile(searchPattern) lineNumber = 0 for line in sys.stdin: lineNumber += 1 m = cpat.search(line) if m is not None: print "(", lineNumber, ")", line Now, create a file with three lines, each with a string of different length: $ printf '"%*s"\n' 271 > fl $ printf '"%*s"\n' 272 >> fl $ printf '"%*s"\n' 273 >> fl And run the script: $ python scriptfile '".{272}"' < fl ( 2 ) "[... 272 blanks]" That looks correct to me. > In the short term, I can resort to locating the character positions > of the quotes, You can also catch all strings and only filter those of the length you care about. -- http://mail.python.org/mailman/listinfo/python-list
Re: Reentrancy of Python interpreter
Brad Johnson <[EMAIL PROTECTED]> writes: > I have a place where I execute a Python command that calls into C++ > code which then in turn calls back into Python using the same > interpreter. I get a fatal error which is "PyThreadStage_Get: no > current thread." Does the C++ code call into the interpreter from a different thread? -- http://mail.python.org/mailman/listinfo/python-list
Re: s.split() on multiple separators
Antoon Pardon <[EMAIL PROTECTED]> writes: > It may be convincing if you only consider natural numbers in > ascending order. Suppose you have the sequence a .. b and you want > the reverse. If you work with included bounds the reverse is just b > .. a. If you use the python convention, things become more > complicated. It's a tradeoff. The convention used by Python (and Lisp, Java and others) is more convenient for other things. Length of the sequence x[a:b] is simply b-a. Empty sequence is denoted simply with x[a:a], where you would need to use the weird x[a:a-1] with inclusive bounds. Subsequences such as x[a:b] and x[b:c] merge smoothly into x[a:c], making it natural to iterate over subsequences without visiting an element twice. > Another problem is if you are working with floats. Suppose you have > a set of floats. Now you want the subset of numbers that are between > a and b included. If you want to follow the convention that means > you have to find the smallest float that is bigger than b, not a > trivial task. The exact same argument can be used against the other convention: if you are working with inclusive bounds, and you need to represent the subset [a, b), you need to find the largest float that is smaller than b. -- http://mail.python.org/mailman/listinfo/python-list
Re: enumerate overflow
Raymond Hettinger <[EMAIL PROTECTED]> writes: > [Paul Rubin] >> I hope in 3.0 there's a real fix, i.e. the count should promote to >> long. > > In Py2.6, I will mostly likely put in an automatic promotion to long > for both enumerate() and count(). It took a while to figure-out how > to do this without killing the performance for normal cases (ones > used in real programs, not examples contrived to say, "omg, see what > *could* happen"). Using PY_LONG_LONG for the counter, and PyLong_FromLongLong to create the Python number should work well for huge sequences without (visibly) slowing down the normal case. -- http://mail.python.org/mailman/listinfo/python-list
Re: migrating to packages
[EMAIL PROTECTED] writes: > I will expose my case quicly. > The MYCLASES.py file contains the A class, so i can use > from MYCLASES import A > a = () > > Using the "package mode" (wich looks fine BTW), having the simple > MYCLASES/ > __init__.py > A.py > > forces my (i guess) to use the > from MYCLASES.A import A Exactly. Using mypackage.mymodule instead of just mymodule is the entire *point* of a package. That way, if someone creates another module with using the same name (mymodule), it won't conflict with yours. If you don't want to change mymodule to mypackage.mymodule, why use a package in the first place? -- http://mail.python.org/mailman/listinfo/python-list
Re: migrating to packages
Bruno Desthuilliers <[EMAIL PROTECTED]> writes: > it's quite common to use the __init__.py of the package (as > explained by Ben) as a facade to the internal organization of the > package, so you can change this internal organization without > breaking client code. We agree on that. It is the OP who *wants* to access his modules directly without ever naming the package. That is why I think he is missing the point of having a package in the first place. >> That way, if someone creates another module with using the same >> name (mymodule), it won't conflict with yours. If you don't want >> to change mymodule to mypackage.mymodule, why use a package in the >> first place? > > Because you have too much code to keep it in a single file. There is no "single file", the OP already has modules A and B. -- http://mail.python.org/mailman/listinfo/python-list
Re: migrating to packages
Bruno Desthuilliers <[EMAIL PROTECTED]> writes: >> We agree on that. It is the OP who *wants* to access his modules >> directly without ever naming the package. > > To be exact, he wants to reorganize it's source code (splitting a > file that's getting too big AFAICT) You're right, I misread his original problem statement (as you also correctly pointed out later in the post). So yes, a package will do what he wants, simply by arranging the necessary imports in __init__.py. Sorry about the misunderstanding. >> That is why I think he is missing the point of having a package in >> the first place. > > MHO opinion is that *you* are missing *one* of the point*s* of having > packages. :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: migrating to packages
Gerardo Herzig <[EMAIL PROTECTED]> writes: > If the original MYCLASSES.py has 5 different classes ,say A,B,C,D,E > , each one has to be imported (as A and B) in order to be used for > the client code. The thing is, there are more than 5 classes, and > looks like a lot of unnecesary work to me, since a particular > program can use 1,2, or 3 classes at the timeThats why im > watching the way to override the `import statement'... > > Damn client code!!! You can create both a package and a compatibility module. The package would be broken into modules for modularity, while the compatibility module would import what old code needs from the package, like this: # old.py: from new.submodule1 import A, B from new.submodule2 import C, D ... Now, old code can keep using "from old import A" and such, while new code would import new.submodule1, new.submodule2, etc., as necessary. Old code is no worse off because, although it uses the compatibility module that just imports everything, that is in essence what the previous module did as well. On the other hand, new code can make use of the modularity and reduce load times by only importing what it really needs. -- http://mail.python.org/mailman/listinfo/python-list
Re: remove list elements..
Abandoned <[EMAIL PROTECTED]> writes: > I do this use FOR easly but the speed very imported for me. I want > to the fastest method please help me. Can you post the code snippet that was too slow for you? Are the lists sorted? -- http://mail.python.org/mailman/listinfo/python-list
Re: Don't use __slots__
Steven D'Aprano <[EMAIL PROTECTED]> writes: > Well, I've read the thread, and I've read the thread it links to, > and for the life of me I'm still no clearer as to why __slots__ > shouldn't be used except that: [...] > But is there actually anything *harmful* that can happen if I use > __slots__? Here is one harmful consequence: __slots__ breaks multiple inheritance: class A(object): __slots__ = ['a', 'b'] class B(object): __slots__ = ['c'] class AB(A, B): pass Traceback (most recent call last): File "", line 1, in TypeError: Error when calling the metaclass bases multiple bases have instance lay-out conflict Even if A and B had the exact same slots, for example ['a', 'b'], it wouldn't make a difference. AB explicitly setting __slots__ to something like ['a', 'b', 'c'] doesn't help either. But that is only a technical answer to your technical question which misses the real problem people like Aahz and Guido have with __slots__. (I don't claim to represent them, of course, the following is my interpretation.) The backlash against __slots__ is a consequence of it being so easy to misunderstand what __slots__ does and why it exists. Seeing __slots__ has led some people to recommend __slots__ to beginners as a way to "catch spelling mistakes", or as a way to turn Python's classes into member-declared structures, a la Java. For people coming from Java background, catching mistakes as early as possible is almost a dogma, and they are prone to accept the use of __slots__ (and living with the shortcomings) as a rule. Python power users scoff at that because it goes against everything that makes Python Python. Use of __slots__ greatly reduces class flexibility, by both disabling __dict__ and __weakref__ by default, and by forcing a tight instance layout that cripples inheritance. With people using __slots__ for the majority of their classes, it becomes much harder for 3rd-party code to attach an unforeseen attribute to an existing object. Even with single inheritance, __slots__ has unintuitive semantics because subclasses automatically get __dict__ and __weakref__, thereby easily breaking the "benefits" of their use. __slots__ is a low-level tool that allows creation of dict-less objects without resorting to Python/C. As long as one understands it as such, there is no problem with using it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Singleton
[EMAIL PROTECTED] writes: > Now when I run the 'run.py', it will print two different numbers. > sys.modules tells me that 'mod1' is imported as both 'one.mod1' and > 'mod1', which explains the result. If I were you, I'd make sure that the module duplicate problem is resolved first, for example by putting run.py somewhere outside one/. Then the singleton problem disappears as well. > It is possible to solve this by always importing with the complete > path like 'one.mod1', even when inside the 'one' directory, but > that's an error waiting to happen. Is it, really? As far as I can tell, Python handles that case rather robustly. For example: $ mkdir one $ touch one/__init__.py $ touch one/mod1.py one/mod2.py $ echo 'import mod2' > one/mod1.py $ python Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import one.mod1 >>> import sys >>> sorted(sys.modules) ['UserDict', '__builtin__', '__main__', '_codecs', '_sre', '_types', 'codecs', 'copy_reg', 'encodings', 'encodings.aliases', 'encodings.codecs', 'encodings.encodings', 'encodings.types', 'encodings.utf_8', 'exceptions', 'linecache', 'one', 'one.mod1', 'one.mod2', 'os', 'os.path', 'posix', 'posixpath', 're', 'readline', 'rlcompleter', 'signal', 'site', 'sre_compile', 'sre_constants', 'sre_parse', 'stat', 'sys', 'types', 'warnings', 'zipimport'] Although mod1 imports mod2 simply with "import mod2", the fact that mod1 itself is imported as part of "one" is respected. As a result, mod2 is imported as "one.mod2", exactly as if it were imported from outside the "one" package. run.py is an exception because it is started directly using "python run.py", so it never gets the information that it's supposed to be part of a package. To fix the problem, all you need to do is make sure that executable scripts such as run.py are either placed safely outside the package, or that they take care to always use absolute imports, such as "import one.mod1" instead of "import mod1". Placing them outside the package is a good example of preventing an error waiting to happen, like the one you hinted at. -- http://mail.python.org/mailman/listinfo/python-list