Re: Comparing strings from the back?
On 04/09/2012 03:54, Roy Smith wrote: Let's assume you're testing two strings for equality. You've already done the obvious quick tests (i.e they're the same length), and you're down to the O(n) part of comparing every character. I'm wondering if it might be faster to start at the ends of the strings instead of at the beginning? If the strings are indeed equal, it's the same amount of work starting from either end. But, if it turns out that for real-life situations, the ends of strings have more entropy than the beginnings, the odds are you'll discover that they're unequal quicker by starting at the end. From the rest of the thread, it looks like in most situations it won't make much difference as typically very few characters need to be compared if they are unequal. However, if you were in a situation with many strings which were almost equal, the most general way to improve the situation might be store a hash of the string along with the string, i.e. store (hash(x), x) and then compare equality of this tuple. Almost all of the time, if the strings are unequal the hash will be unequal. Or, as someone else suggested, use interned versions of the strings. This is basically the same solution but even better. In this case, your startup costs will be higher (creating the strings) but your comparisons will always be instant. Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Comparing strings from the back?
On 10/09/2012 18:33, Oscar Benjamin wrote: Computing the hash always requires iterating over all characters in the string so is best case O(N) where string comparison is best case (and often average case) O(1). Yep, but you already have O(N) costs just creating the strings in the first place, so it's absorbed into that. It's only worth doing if you do many comparisons. Also, so far as I know the hash value once computed is stored on the string object itself [1] and used for subsequent string comparisons so there's no need for you to do that in your code. Cool, Python is very clever as always! :) Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Comparing strings from the back?
On 10/09/2012 18:07, Dan Goodman wrote: On 04/09/2012 03:54, Roy Smith wrote: Let's assume you're testing two strings for equality. You've already done the obvious quick tests (i.e they're the same length), and you're down to the O(n) part of comparing every character. I'm wondering if it might be faster to start at the ends of the strings instead of at the beginning? If the strings are indeed equal, it's the same amount of work starting from either end. But, if it turns out that for real-life situations, the ends of strings have more entropy than the beginnings, the odds are you'll discover that they're unequal quicker by starting at the end. From the rest of the thread, it looks like in most situations it won't make much difference as typically very few characters need to be compared if they are unequal. However, if you were in a situation with many strings which were almost equal, the most general way to improve the situation might be store a hash of the string along with the string, i.e. store (hash(x), x) and then compare equality of this tuple. Almost all of the time, if the strings are unequal the hash will be unequal. Or, as someone else suggested, use interned versions of the strings. This is basically the same solution but even better. In this case, your startup costs will be higher (creating the strings) but your comparisons will always be instant. Just had another thought about this. Although it's unlikely to be necessary in practice since (a) it's rarely necessary at all, and (b) when it is, hashing and optionally interning seems like the better approach, I had another idea that would be more general. Rather than starting from the beginning or the end, why not do something like: check the first and last character, then the len/2 character, then the len/4, then 3*len/4, then len/8, 3*len/8, etc. You'd need to be a bit clever about making sure you hit every character but I'm sure someone's already got an efficient algorithm for this. You could probably even make this cache efficient by working on cache line length blocks. Almost certainly entirely unnecessary, but I like the original question and it's a nice theoretical problem. Dan -- http://mail.python.org/mailman/listinfo/python-list
simple system for building packages for multiple platforms?
Hi all, Until recently, our package has been pure Python, so distributing it has been straightforward. Now, however, we want to add some extension modules in C++. We're happy to provide source only distributions on Linux because almost all Linux users will have all the required compilers and so forth already installed. But we also want to support Windows users who may not have C++ compilers available, which means providing built distributions. But, we're faced with this problem, there are three versions of Python we're supporting (2.5-2.7) and two architectures (32 and 64 bit), which means 6 possible platforms to build for (and maybe more in future if we upgrade to Python 3). Is there a straightforward way to set up a semi-automated build system for this? At the moment, I'm thinking about either having a bunch of virtual machines or Amazon EC2 instances and submitting build jobs to these, but setting that up looks to be a lot of work and I guess many people have had this problem before. So, what do other people do? Also, once we have a build system up, I guess it can also be used for a more extensive testing system on these multiple platforms? Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Terrible FPU performance
Hi, On 26/04/2011 15:40, Mihai Badoiu wrote: > I have terrible performance for multiplication when one number gets very > close to zero. I'm using cython by writing the following code: This might be an issue with denormal numbers: http://en.wikipedia.org/wiki/Denormal_number I don't know much about them though, so I can't advise any further than that... Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Turning an AST node / subnodes into something human-readable
Chris Angelico gmail.com> writes: > I'm working with the ast module to do some analysis on Python > codebases, and once I've found what I'm looking for, I want to print > something out. The file name I'm hanging onto externally, so that > works; and the nodes all have a lineno. So far so good. But how do I > "reconstitute" a subtree into something fit for human consumption? I did something like this, feel free to use my code: https://github.com/brian-team/brian2/blob/master/brian2/parsing/rendering.py At the moment, it only works for series of mathematical statements (no control conditions, loops, array notation, etc.), but it would be pretty easy to add those. It also puts too many parentheses in outputted expressions, e.g. 1+2+3 would come back as (1+2)+3, etc. Dan -- https://mail.python.org/mailman/listinfo/python-list
Re: nth root
Takes less than 1 sec here to do (10**100)**(1./13) a million times, and only about half as long to do (1e100)**(1./13), or about 14 times as long as to do .2**2. Doesn't look like one could hope for it to be that much quicker as you need 9 sig figs of accuracy to get the integer part of (10**100)**(1./13) (floats have about 7 and doubles about 16). Dan Tim wrote: In PythonWin I'm running a program to find the 13th root (say) of millions of hundred-digit numbers. I'm using n = 13 root = base**(1.0/n) which correctly computes the root to a large number of decimal places, but therefore takes a long time. All I need is the integer component. Is there a quicker way? -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: nth root
Mark Dickinson wrote: I'd also be a bit worried about accuracy. Is it important to you that the integer part of the result is *exactly* right, or is it okay if (n**13)**(1./13) sometimes comes out as slightly less than n, or if (n**13-1)**(1./13) sometimes comes out as n? I don't think accuracy is too big a problem here actually (at least for 13th roots). I just tested it with several hundred thousand random 100 digit numbers and it never made a mistake. The precision of double ought to easily guarantee a correct result. If you let x=int(1e100**(1./13)) then ((x+1)**13-x**13)/x**13=2.6e-7 so you only need around the first 8 or 9 digits of the 100 digit number to compute the 13th root exactly (well within the accuracy of a double). OTOH, suppose you were doing cube roots instead then you would need the first 35 digits of the 100 digit number and this is more accurate than a double. So for example int(1e100**(1./3)) is a long way from being the integer part of the true cube root (it's between 10**18 and 10**19 away). Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: nth root
Mark Dickinson wrote: Well, random numbers is one thing. But how about the following: n = 12345**13 n 154662214940914131102165197707101295849230845947265625L int(n ** (1./13)) # should be 12345; okay 12345 int((n-1) ** (1./13)) # should be 12344; oops! 12345 Good point! Oops indeed. :-) Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: code challenge: generate minimal expressions using only digits 1,2,3
This sounds kind of similar to a problem I posted to this list last year, you might find that thread gives you some ideas: http://mail.python.org/pipermail/python-list/2008-January/474071.html Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: code challenge: generate minimal expressions using only digits 1,2,3
MRAB wrote: Here's my solution (I haven't bothered with making it efficient, BTW): import operator def solve(n): best_len = n best_expr = "" for x in range(1, n - 2): for y in range(x, n): for op, name in operator_list: # Does this pair with this op give the right answer? if op(x, y) == n: lx, ex = numbers[x - 1] ly, ey = numbers[y - 1] # How many digits in total? l = lx + ly # Fewer digits than any previous attempts? if l < best_len: # Build the expression. e = "(%s %s %s)" % (ex, name, ey) best_len, best_expr = l, e return best_len, best_expr operator_list = [(operator.add, "+"), (operator.sub, "-"), (operator.mul, "*"), (operator.div, "/"), (operator.pow, "^")] # Tuples contain number of digits used and expression. numbers = [(1, str(n)) for n in range(1, 4)] for n in range(4, 34): numbers.append(solve(n)) for i, (l, e) in enumerate(numbers): print "%2d = %s" % (i + 1, e) Sadly this doesn't work because you're assuming that the shortest expression always involves increasing the size of the numbers at every step. For example, your algorithm for 63 gives: (3 * (3 * (1 + (2 * 3 which involves 4 operations, whereas the simplest is: 2**(2*3)-1 which only involves 3 (but has an intermediate value 2**6=64 above the target value). Fixing this is actually pretty hard, because the search space becomes very large if you allow yourself arbitrarily large numbers. You can't do any obvious pruning that I can think of, because operations between very large numbers can end up very small (for example, 13**3-3**7=10 even though 13**3=2197 and 3**7=2187 are both very large), and the shortest path to a small number might go through very large numbers (I bet with a bit more thought than I have put into it, you could come up with some examples where the intermediate calculations are arbitrarily larger than the end result). As well as the growth of the search space, the computations get hairy too, try computing 3**3**3**3 for example (or better, don't, because it has 3638334640025 digits). I haven't got a solution that can be guaranteed correct and is feasible to compute for anything but very small examples - anyone done better? Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Modifying the value of a float-like object
eric.le.bi...@spectro.jussieu.fr wrote: Hello, Is there a way to easily build an object that behaves exactly like a float, but whose value can be changed? The goal is to maintain a list [x, y,…] of these float-like objects, and to modify their value on the fly (with something like x.value = 3.14) so that any expression like "x +y" uses the new value. Hi Eric, Numpy's array object can do something like what you want: In [27]: x=array(0.0) In [28]: print x, sin(x) 0.0 0.0 In [29]: x.itemset(pi/2) In [30]: print x, sin(x) 1.57079632679 1.0 Not sure if this a recommended way of using array or not, but it seems to work. The problem is that any calculation you do with such an object will result in a float, and not another numpy array (although inplace operations work): In [40]: print (x*2).__class__ In [41]: x *= 2 In [42]: print x.__class__ So it's not a perfect solution, but it might be OK for what you need. Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Modifying the value of a float-like object
eric.le.bi...@spectro.jussieu.fr wrote: I initially tried to create a Float_ref class that inherits from numpy.array, so that objects of the new class behave like numpy.array in calculations, but also contain an "uncertainty" atribute, but this is apparently not allowed ("cannot create 'builtin_function_or_method' instances"). So I thought I'd directly add an attribute to a numpy.array: "x = numpy.array(3.14); x.uncertainty = 0.01", but this is not allowed either. Thus, it is not obvious for me to extend the 1x1 numpy.array object so that it also contains an "uncertainty" attribute. Subclassing numpy arrays is possible but it can be a bit fiddly. You can't add attributes to a numpy array unless it's a subclass. The first place to look is: http://www.scipy.org/Subclasses Actually, I haven't read this (new) page in the numpy docs, but it looks better: http://docs.scipy.org/doc/numpy/user/basics.subclassing.html It also seems to include a simple example of how to write a subclass that just adds a single attribute. Dan -- http://mail.python.org/mailman/listinfo/python-list
C++ code generation
Hi all, I'm doing some C++ code generation using Python, and would be interested in any comments on the approach I'm taking. Basically, the problem involves doing some nested loops and executing relatively simple arithmetic code snippets, like: for i in xrange(len(X)): X[i] += 5 Actually they're considerably more complicated than this, but this gives the basic idea. One way to get C++ code from this would be to use Cython, but there are two problems with doing that. The first problem is that the arithmetic code snippets are user-specified. What I want to do is generate code, and then compile and run it using Scipy's weave package. The second problem is that I have various different data structures and the C++ code generated needs to be different for the different structures (e.g. sparse or dense matrices). So far what I've been doing is writing Python code that writes the C++ code, but in a very non-transparent way. I like the idea of specifying the C++ code using Python syntax, like in Cython. So the idea I came up with was basically to abuse generators and iterators so that when you write something like: for x in X: ... it actually outputs some C++ code that looks like: for(int i=0; iThe ... in the Python code is only executed once because when X is iterated over it only returns one value. Here's the example I've written so far (complete code given below): # initialisation code code = OutputCode() evaluate = Evaluator(code) X = Array(code, 'values') # specification of the C++ loop for x in X: evaluate('x += 5; x *= 2') # and show the output print code.code It generates the following C++ code: for(int values_index=0; values_indexOK, so that's an overview of the idea that I have of how to do it. Any comments or suggestions on either the approach or the implementation? Below is the complete code I've written for the example above (linewraps aren't perfect but there's only a couple of lines to correct). Thanks for any feedback, Dan import re, inspect # We just use this class to identify certain variables class Symbol(str): pass # This class is basically just a mutable string class OutputCode(object): def __init__(self): self.code = '' def __iadd__(self, code): self.code = self.code+code return self # Iterating over instances of this class generates code # for iterating over a C++ array, it yields a single # Symbol object, the variable name of the value in the # array class Array(object): def __init__(self, code, name, dtype='double'): self.name = name self.dtype = dtype self.code = code def __iter__(self): def f(): self.code += 'for(int {name}_index=0; {name}_index<{name}_len; {name}_index++){{\n'.format(name=self.name) self.code += '{dtype} &{name} = {name}_array[{name}_index];\n'.format(dtype=self.dtype, name=self.name) yield Symbol(self.name) self.code += '}\n' return f() # Instances of this class generate C++ code from Python syntax # code snippets, replacing variable names that are a Symbol in the # namespace with the value of that Symbol. class Evaluator(object): def __init__(self, code): self.code = code def __call__(self, code): # The set of variables in the code snippet vars = re.findall(r'\b(\w+)\b', code) # Extract any names from the namespace of the calling frame frame = inspect.stack()[1][0] globals, locals = frame.f_globals, frame.f_locals values = {} for var in vars: if var in locals: values[var] = locals[var] elif var in globals: values[var] = globals[var] # Replace any variables whose values are Symbols with their values for var, value in values.iteritems(): if isinstance(value, Symbol): code = re.sub(r'\b{var}\b'.format(var=var), str(value), code) # Turn Python snippets into C++ (just a simplified version for now) code = code.replace(';', '\n') lines = [line.strip() for line in code.split('\n')] code = ''.join(line+';\n' for line in lines) self.code += code if __name__=='__main__': code = OutputCode() evaluate = Evaluator(code) X = Array(code, 'values') for x in X: evaluate('x += 5; x *= 2') print code.code -- http://mail.python.org/mailman/listinfo/python-list
Execute bit in .tar.gz file produced by distutils
Hi all, I have a problem with using distutils and was hoping someone might be able to help. The .exe and .zip files work fine - and easy_install uses the .zip file by default it seems - but in the .tar.gz file the execute bit is not set for any directories meaning a user on linux has to go through doing chmod a+x on them all by hand. I produce the files on my computer, which is Windows 7, and presumably this is the problem but I'm a bit surprised that it doesn't produce a useable .tar.gz file so maybe I'm missing something obvious? The command I run to produce the distribution files is: setup.py sdist --formats=gztar,zip The versions I'm running are Python 2.6.2 and distutils 2.6.6. I did notice that if I run a cygwin bash shell and go to the project directory (i.e. /cygdrive/d/...) and run "ls -l" the permissions are set the same way as they appear in the .tar.gz file (i.e. execute bit not set), but I'm not sure if these values are meaningful because they probably don't correspond to any underlying values in the Win 7 file system. Any help much appreciated! Thanks, Dan Goodman -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating nested code with context managers
I tried a very similar thing, but not using with statements: http://mail.python.org/pipermail/python-list/2010-March/1239577.html Dan On 04/05/2010 22:36, Terry Reedy wrote: In a current thread, people have claimed that generating properly indented nested blocks is a pain because of the need to keep track of indent levels. Someone countered with the now rather ancient http://effbot.org/zone/python-code-generator.htm The usage example c = CodeGeneratorBackend() c.begin(tab=" ") c.write("for i in range(1000):\n") c.indent() c.write("print 'code generation is trivial'") c.dedent() illustrates three problems with the CodeGeneratorBackend class. 1) it requires explicit \n on all lines (which the second omits, though it is non-fatal since it is also the last) 2) the user still has to manually match indents and dedents, and 3) the user *cannot* indent lines that produce indented code. The relatively new with statement and associated context managers are designed, among other things, for this situation, where one needs to alter and restore a global context. So here is my updated (3.1) proof-of-concept version. class PyCodeGen: def __init__(self, tab=" "): self.code = [] self.tab = tab self.level = 0 # all attributes should be treated as read-only def end(self): return '\n'.join(self.code) def line(self, string): # new line self.code.append(self.tab * self.level + string) class For: def __init__(self, target, in_expression): target.line('for ' + in_expression + ':') self.target = target def __enter__(self): self.target.level += 1 def __exit__(self, t, v, tb): self.target.level -= 1 c = PyCodeGen() with For(c, 'i in range(1000)'): c.line('print("Code gen is easy")') c.line('# done') print(c.end()) # prints for i in range(1000): print("Code gen is easy") # done Note that the absence of .indent and .dedent is intentional. In a fleshed out system, there would be a context manager for each compound statement and these would handle all indents and dedents. If one really preferred to write, for instance, 'c.For(s); instead of 'For(c,s)' in the with statement, one could add a wrapper method like def For(self, s): return For(self, s) for each context manager. I left that out. Similar methods can be used to auto-match C braces and other open/close pairs. Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Sumatra m/l?
On 10/07/2010 22:08, Neal Becker wrote: Sumatra looks like an interesting project http://pypi.python.org/pypi/Sumatra/0.2 But I have some questions. Is there any mail list or forum? I can't find anything on the website. It's part of Neural Ensemble which has a google group (not specifically devoted to Sumatra, but all their projects): http://groups.google.com/group/neuralensemble Dan -- http://mail.python.org/mailman/listinfo/python-list