On May 3, 1:59 pm, [EMAIL PROTECTED] wrote: > Sometimes different languages suggests me ways to cross-pollinate > them. > > (Note: probably someone has already suggested (and even implemented) > the following ideas, but I like to know why they aren't fit for > Python). > > Python generators now allow me to package some code patterns common in > my code, that other (older) languages didn't allow me to extract; for > example given a sequence (array, list, etc) of items now and then I > need to scan all the upper triangle of their cross matrix: > > def xpairs(seq): > > len_seq = len(seq) > > for i, e1 in enumerate(seq): > > for j in xrange(i+1, len_seq): > > yield e1, seq[j] > > Or adjacent ones (there are ways to generalize the following function, > but this the most common situation for me): > > def xpairwise(iterable): > > return izip(iterable, islice(iterable, 1, None)) > > That xpairs() generator is nice, but it's not the best possible code > (but you may disagree with me, and you may think that code better than > the successive D code that uses two slices). Inside it I can't use a > list slicing because it copies subparts of the list, probably becoming > too much slow (for example if len(seq) == 1000). > > Compared to Python, the D language is at lower level, so you may > expect it to have a worse syntax (but the ShedSkin compiler has shown > me once for all that you can have a language that is both high-level, > with a short & sexy syntax, and very fast), but you can define a > xpairs() iterable class in it too, and the core of that iterable class > there's a method that may look like this: > > if (this.seq.length > 1) > > foreach (i, e1; this.seq[0 .. $-1]) > > foreach (e2; this.seq[i+1 .. $]) { > > result = dg(e1, e2); if (result) break; // yield > > } > > Where the strange last line is the yield, and the $ inside [] is the > length of the current array (a very clever idea that avoids the need > for negative indexes). > > That D code is as fast or faster than the code you can back-translate > from Python, this is possible because in D arrays slices are very > light, they are a struct of <length, pointer> (in the future they may > change to a couple of pointers, to speed up the array slice scanning). > So if you slice an array you don't copy its contents, just the start- > end points of the slice. If you read/scan the slice that's all you > have, while if you write on it, D uses a Copy-On-Write strategy, that > is a just-in-time copy of the slice contents. > > In practice this speeds up lot of code that manages strings and arrays > (like a XML parser, making it among the faster ones). > > Being the slice a struct, it's stack allocated, so the inner foreach > doesn't even create a slice object in the heap each time the outer > foreach loops. > > One problem with this strategy is that if you have a huge string, and > you keep only a little slice of it, the D garbage collector will keep > it all in memory. To avoid that problem you have to duplicate the > slice manually: > > somestring[inf...sup].dup; > > I think Psyco in some situations is able to manage string slices > avoiding the copy. > > I think Python 3.0 too may enjoy a similar strategy of light slices + > COW for strings, lists and arrays (tuples and strings don't need the > COW). > > Bye, > > bearophile
In my understanding, the operating system links files across multiple sections on a disk, while keeping those details from client software. Files: AAAA BBBB AA CCCCCCCCC While File A still reads as: AAAAAA, correctly. Modifications to B as follow: Files: AAAA BBBB AA CCCCCCCCC BBBB In the case of a large mutable string, modifications can cause linear- time operations, even if only making a constant-time change. String: abcdefg Modification: aAbcdefg causes the entire string to be recopied. Expensive at scale. A string-on-disk structure could provide linking, such as a String File Allocation Table. abcdefg A correctly reads as: aAbcdefg -- http://mail.python.org/mailman/listinfo/python-list