Re: Comparing strings from the back?

2012-09-10 Thread Dan Goodman

On 04/09/2012 03:54, Roy Smith wrote:

Let's assume you're testing two strings for equality.  You've already
done the obvious quick tests (i.e they're the same length), and you're
down to the O(n) part of comparing every character.

I'm wondering if it might be faster to start at the ends of the strings
instead of at the beginning?  If the strings are indeed equal, it's the
same amount of work starting from either end.  But, if it turns out that
for real-life situations, the ends of strings have more entropy than the
beginnings, the odds are you'll discover that they're unequal quicker by
starting at the end.


From the rest of the thread, it looks like in most situations it won't 
make much difference as typically very few characters need to be 
compared if they are unequal.


However, if you were in a situation with many strings which were almost 
equal, the most general way to improve the situation might be store a 
hash of the string along with the string, i.e. store (hash(x), x) and 
then compare equality of this tuple. Almost all of the time, if the 
strings are unequal the hash will be unequal. Or, as someone else 
suggested, use interned versions of the strings. This is basically the 
same solution but even better. In this case, your startup costs will be 
higher (creating the strings) but your comparisons will always be instant.


Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Dan Goodman

On 10/09/2012 18:33, Oscar Benjamin wrote:

Computing the hash always requires iterating over all characters in the string
so is best case O(N) where string comparison is best case (and often average
case) O(1).


Yep, but you already have O(N) costs just creating the strings in the 
first place, so it's absorbed into that. It's only worth doing if you do 
many comparisons.



Also, so far as I know the hash value once computed is stored on the string
object itself [1] and used for subsequent string comparisons so there's no
need for you to do that in your code.


Cool, Python is very clever as always! :)

Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing strings from the back?

2012-09-10 Thread Dan Goodman

On 10/09/2012 18:07, Dan Goodman wrote:

On 04/09/2012 03:54, Roy Smith wrote:

Let's assume you're testing two strings for equality.  You've already
done the obvious quick tests (i.e they're the same length), and you're
down to the O(n) part of comparing every character.

I'm wondering if it might be faster to start at the ends of the strings
instead of at the beginning?  If the strings are indeed equal, it's the
same amount of work starting from either end.  But, if it turns out that
for real-life situations, the ends of strings have more entropy than the
beginnings, the odds are you'll discover that they're unequal quicker by
starting at the end.


 From the rest of the thread, it looks like in most situations it won't
make much difference as typically very few characters need to be
compared if they are unequal.

However, if you were in a situation with many strings which were almost
equal, the most general way to improve the situation might be store a
hash of the string along with the string, i.e. store (hash(x), x) and
then compare equality of this tuple. Almost all of the time, if the
strings are unequal the hash will be unequal. Or, as someone else
suggested, use interned versions of the strings. This is basically the
same solution but even better. In this case, your startup costs will be
higher (creating the strings) but your comparisons will always be instant.


Just had another thought about this. Although it's unlikely to be 
necessary in practice since (a) it's rarely necessary at all, and (b) 
when it is, hashing and optionally interning seems like the better 
approach, I had another idea that would be more general. Rather than 
starting from the beginning or the end, why not do something like: check 
the first and last character, then the len/2 character, then the len/4, 
then 3*len/4, then len/8, 3*len/8, etc. You'd need to be a bit clever 
about making sure you hit every character but I'm sure someone's already 
got an efficient algorithm for this. You could probably even make this 
cache efficient by working on cache line length blocks. Almost certainly 
entirely unnecessary, but I like the original question and it's a nice 
theoretical problem.


Dan
--
http://mail.python.org/mailman/listinfo/python-list


simple system for building packages for multiple platforms?

2012-02-01 Thread Dan Goodman

Hi all,

Until recently, our package has been pure Python, so distributing it has 
been straightforward. Now, however, we want to add some extension 
modules in C++. We're happy to provide source only distributions on 
Linux because almost all Linux users will have all the required 
compilers and so forth already installed. But we also want to support 
Windows users who may not have C++ compilers available, which means 
providing built distributions. But, we're faced with this problem, there 
are three versions of Python we're supporting (2.5-2.7) and two 
architectures (32 and 64 bit), which means 6 possible platforms to build 
for (and maybe more in future if we upgrade to Python 3). Is there a 
straightforward way to set up a semi-automated build system for this?


At the moment, I'm thinking about either having a bunch of virtual 
machines or Amazon EC2 instances and submitting build jobs to these, but 
setting that up looks to be a lot of work and I guess many people have 
had this problem before. So, what do other people do?


Also, once we have a build system up, I guess it can also be used for a 
more extensive testing system on these multiple platforms?


Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Terrible FPU performance

2011-04-26 Thread Dan Goodman

Hi,

On 26/04/2011 15:40, Mihai Badoiu wrote:
> I have terrible performance for multiplication when one number gets very
> close to zero.  I'm using cython by writing the following code:

This might be an issue with denormal numbers:

http://en.wikipedia.org/wiki/Denormal_number

I don't know much about them though, so I can't advise any further than 
that...


Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Turning an AST node / subnodes into something human-readable

2014-02-19 Thread Dan Goodman
Chris Angelico  gmail.com> writes:
> I'm working with the ast module to do some analysis on Python
> codebases, and once I've found what I'm looking for, I want to print
> something out. The file name I'm hanging onto externally, so that
> works; and the nodes all have a lineno. So far so good. But how do I
> "reconstitute" a subtree into something fit for human consumption?

I did something like this, feel free to use my code:

https://github.com/brian-team/brian2/blob/master/brian2/parsing/rendering.py

At the moment, it only works for series of mathematical statements (no
control conditions, loops, array notation, etc.), but it would be pretty
easy to add those. It also puts too many parentheses in outputted
expressions, e.g. 1+2+3 would come back as (1+2)+3, etc.

Dan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: nth root

2009-01-30 Thread Dan Goodman
Takes less than 1 sec here to do (10**100)**(1./13) a million times, and 
only about half as long to do (1e100)**(1./13), or about 14 times as 
long as to do .2**2. Doesn't look like one could hope for it to be that 
much quicker as you need 9 sig figs of accuracy to get the integer part 
of (10**100)**(1./13) (floats have about 7 and doubles about 16).


Dan

Tim wrote:
In PythonWin I'm running a program to find the 13th root (say) of 
millions of hundred-digit numbers.  I'm using

n = 13
root = base**(1.0/n)
which correctly computes the root to a large number of decimal 
places, but therefore takes a long time.  All I need is the integer 
component.  Is there a quicker way?
 





--
http://mail.python.org/mailman/listinfo/python-list


--
http://mail.python.org/mailman/listinfo/python-list


Re: nth root

2009-01-31 Thread Dan Goodman

Mark Dickinson wrote:

I'd also be a bit worried about accuracy.   Is it important to you
that the
integer part of the result is *exactly* right, or is it okay if
(n**13)**(1./13) sometimes comes out as slightly less than n, or if
(n**13-1)**(1./13) sometimes comes out as n?


I don't think accuracy is too big a problem here actually (at least for 
13th roots). I just tested it with several hundred thousand random 100 
digit numbers and it never made a mistake. The precision of double ought 
to easily guarantee a correct result. If you let x=int(1e100**(1./13)) 
then ((x+1)**13-x**13)/x**13=2.6e-7 so you only need around the first 8 
or 9 digits of the 100 digit number to compute the 13th root exactly 
(well within the accuracy of a double).


OTOH, suppose you were doing cube roots instead then you would need the 
first 35 digits of the 100 digit number and this is more accurate than a 
double. So for example int(1e100**(1./3)) is a long way from being the 
integer part of the true cube root (it's between 10**18 and 10**19 away).


Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: nth root

2009-01-31 Thread Dan Goodman

Mark Dickinson wrote:

Well, random numbers is one thing.  But how about the following:


n = 12345**13
n

154662214940914131102165197707101295849230845947265625L

int(n ** (1./13))  # should be 12345; okay

12345

int((n-1) ** (1./13))  # should be 12344; oops!

12345


Good point! Oops indeed. :-)

Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: code challenge: generate minimal expressions using only digits 1,2,3

2009-02-20 Thread Dan Goodman
This sounds kind of similar to a problem I posted to this list last 
year, you might find that thread gives you some ideas:


http://mail.python.org/pipermail/python-list/2008-January/474071.html

Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: code challenge: generate minimal expressions using only digits 1,2,3

2009-02-26 Thread Dan Goodman

MRAB wrote:

Here's my solution (I haven't bothered with making it efficient, BTW):

import operator

def solve(n):
best_len = n
best_expr = ""
for x in range(1, n - 2):
for y in range(x, n):
for op, name in operator_list:
# Does this pair with this op give the right answer?
if op(x, y) == n:
lx, ex = numbers[x - 1]
ly, ey = numbers[y - 1]
# How many digits in total?
l = lx + ly
# Fewer digits than any previous attempts?
if l < best_len:
# Build the expression.
e = "(%s %s %s)" % (ex, name, ey)
best_len, best_expr = l, e
return best_len, best_expr

operator_list = [(operator.add, "+"), (operator.sub, "-"), 
(operator.mul, "*"), (operator.div, "/"), (operator.pow, "^")]


# Tuples contain number of digits used and expression.
numbers = [(1, str(n)) for n in range(1, 4)]

for n in range(4, 34):
numbers.append(solve(n))

for i, (l, e) in enumerate(numbers):
print "%2d = %s" % (i + 1, e)


Sadly this doesn't work because you're assuming that the shortest 
expression always involves increasing the size of the numbers at every 
step. For example, your algorithm for 63 gives:


(3 * (3 * (1 + (2 * 3

which involves 4 operations, whereas the simplest is:

2**(2*3)-1

which only involves 3 (but has an intermediate value 2**6=64 above the 
target value).


Fixing this is actually pretty hard, because the search space becomes 
very large if you allow yourself arbitrarily large numbers. You can't do 
any obvious pruning that I can think of, because operations between very 
large numbers can end up very small (for example, 13**3-3**7=10 even 
though 13**3=2197 and 3**7=2187 are both very large), and the shortest 
path to a small number might go through very large numbers (I bet with a 
bit more thought than I have put into it, you could come up with some 
examples where the intermediate calculations are arbitrarily larger than 
the end result). As well as the growth of the search space, the 
computations get hairy too, try computing 3**3**3**3 for example (or 
better, don't, because it has 3638334640025 digits).


I haven't got a solution that can be guaranteed correct and is feasible 
to compute for anything but very small examples - anyone done better?


Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Modifying the value of a float-like object

2009-04-15 Thread Dan Goodman

eric.le.bi...@spectro.jussieu.fr wrote:

Hello,

Is there a way to easily build an object that behaves exactly like a
float, but whose value can be changed?  The goal is to maintain a list
[x, y,…] of these float-like objects, and to modify their value on the
fly (with something like x.value = 3.14) so that any expression like "x
+y" uses the new value.


Hi Eric,

Numpy's array object can do something like what you want:

In [27]: x=array(0.0)

In [28]: print x, sin(x)
0.0 0.0

In [29]: x.itemset(pi/2)

In [30]: print x, sin(x)
1.57079632679 1.0

Not sure if this a recommended way of using array or not, but it seems 
to work. The problem is that any calculation you do with such an object 
will result in a float, and not another numpy array (although inplace 
operations work):


In [40]: print (x*2).__class__


In [41]: x *= 2

In [42]: print x.__class__


So it's not a perfect solution, but it might be OK for what you need.

Dan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Modifying the value of a float-like object

2009-04-15 Thread Dan Goodman

eric.le.bi...@spectro.jussieu.fr wrote:

I initially tried to create a Float_ref class that inherits from
numpy.array, so that objects of the new class behave like numpy.array
in calculations, but also contain an "uncertainty" atribute, but this
is apparently not allowed ("cannot create 'builtin_function_or_method'
instances").

So I thought I'd directly add an attribute to a numpy.array: "x =
numpy.array(3.14); x.uncertainty = 0.01", but this is not allowed
either.

Thus, it is not obvious for me to extend the 1x1 numpy.array object so
that it also contains an "uncertainty" attribute.


Subclassing numpy arrays is possible but it can be a bit fiddly. You 
can't add attributes to a numpy array unless it's a subclass.


The first place to look is:

http://www.scipy.org/Subclasses

Actually, I haven't read this (new) page in the numpy docs, but it looks 
better:


http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

It also seems to include a simple example of how to write a subclass 
that just adds a single attribute.


Dan

--
http://mail.python.org/mailman/listinfo/python-list


C++ code generation

2010-03-16 Thread Dan Goodman

Hi all,

I'm doing some C++ code generation using Python, and would be interested 
in any comments on the approach I'm taking.


Basically, the problem involves doing some nested loops and executing 
relatively simple arithmetic code snippets, like:


for i in xrange(len(X)):
  X[i] += 5

Actually they're considerably more complicated than this, but this gives 
the basic idea. One way to get C++ code from this would be to use 
Cython, but there are two problems with doing that. The first problem is 
that the arithmetic code snippets are user-specified. What I want to do 
is generate code, and then compile and run it using Scipy's weave 
package. The second problem is that I have various different data 
structures and the C++ code generated needs to be different for the 
different structures (e.g. sparse or dense matrices).


So far what I've been doing is writing Python code that writes the C++ 
code, but in a very non-transparent way. I like the idea of specifying 
the C++ code using Python syntax, like in Cython. So the idea I came up 
with was basically to abuse generators and iterators so that when you 
write something like:


for x in X:
   ...

it actually outputs some C++ code that looks like:

for(int i=0; iThe ... in the Python code is only executed once because when X is 
iterated over it only returns one value.


Here's the example I've written so far (complete code given below):

# initialisation code
code = OutputCode()
evaluate = Evaluator(code)
X = Array(code, 'values')
# specification of the C++ loop
for x in X:
evaluate('x += 5; x *= 2')
# and show the output
print code.code

It generates the following C++ code:

for(int values_index=0; values_indexOK, so that's an overview of the idea that I have of how to do it. Any 
comments or suggestions on either the approach or the implementation?


Below is the complete code I've written for the example above (linewraps 
aren't perfect but there's only a couple of lines to correct).


Thanks for any feedback,

Dan

import re, inspect

# We just use this class to identify certain variables
class Symbol(str): pass

# This class is basically just a mutable string
class OutputCode(object):
def __init__(self):
self.code = ''
def __iadd__(self, code):
self.code = self.code+code
return self

# Iterating over instances of this class generates code
# for iterating over a C++ array, it yields a single
# Symbol object, the variable name of the value in the
# array
class Array(object):
def __init__(self, code, name, dtype='double'):
self.name = name
self.dtype = dtype
self.code = code
def __iter__(self):
def f():
self.code += 'for(int {name}_index=0; 
{name}_index<{name}_len; {name}_index++){{\n'.format(name=self.name)
self.code += '{dtype} &{name} = 
{name}_array[{name}_index];\n'.format(dtype=self.dtype, name=self.name)

yield Symbol(self.name)
self.code += '}\n'
return f()

# Instances of this class generate C++ code from Python syntax
# code snippets, replacing variable names that are a Symbol in the
# namespace with the value of that Symbol.
class Evaluator(object):
def __init__(self, code):
self.code = code
def __call__(self, code):
# The set of variables in the code snippet
vars = re.findall(r'\b(\w+)\b', code)
# Extract any names from the namespace of the calling frame
frame = inspect.stack()[1][0]
globals, locals = frame.f_globals, frame.f_locals
values = {}
for var in vars:
if var in locals:
values[var] = locals[var]
elif var in globals:
values[var] = globals[var]
# Replace any variables whose values are Symbols with their values
for var, value in values.iteritems():
if isinstance(value, Symbol):
code = re.sub(r'\b{var}\b'.format(var=var), str(value), 
code)

# Turn Python snippets into C++ (just a simplified version for now)
code = code.replace(';', '\n')
lines = [line.strip() for line in code.split('\n')]
code = ''.join(line+';\n' for line in lines)
self.code += code

if __name__=='__main__':
code = OutputCode()
evaluate = Evaluator(code)
X = Array(code, 'values')
for x in X:
evaluate('x += 5; x *= 2')
print code.code

--
http://mail.python.org/mailman/listinfo/python-list


Execute bit in .tar.gz file produced by distutils

2011-02-22 Thread Dan Goodman

Hi all,

I have a problem with using distutils and was hoping someone might be 
able to help. The .exe and .zip files work fine - and easy_install uses 
the .zip file by default it seems - but in the .tar.gz file the execute 
bit is not set for any directories meaning a user on linux has to go 
through doing chmod a+x on them all by hand. I produce the files on my 
computer, which is Windows 7, and presumably this is the problem but I'm 
a bit surprised that it doesn't produce a useable .tar.gz file so maybe 
I'm missing something obvious?


The command I run to produce the distribution files is:

setup.py sdist --formats=gztar,zip

The versions I'm running are Python 2.6.2 and distutils 2.6.6.

I did notice that if I run a cygwin bash shell and go to the project 
directory (i.e. /cygdrive/d/...) and run "ls -l" the permissions are set 
the same way as they appear in the .tar.gz file (i.e. execute bit not 
set), but I'm not sure if these values are meaningful because they 
probably don't correspond to any underlying values in the Win 7 file system.


Any help much appreciated!

Thanks,

Dan Goodman

--
http://mail.python.org/mailman/listinfo/python-list


Re: Generating nested code with context managers

2010-05-05 Thread Dan Goodman

I tried a very similar thing, but not using with statements:

http://mail.python.org/pipermail/python-list/2010-March/1239577.html

Dan

On 04/05/2010 22:36, Terry Reedy wrote:

In a current thread, people have claimed that generating properly
indented nested blocks is a pain because of the need to keep track of
indent levels. Someone countered with the now rather ancient

http://effbot.org/zone/python-code-generator.htm

The usage example

c = CodeGeneratorBackend()
c.begin(tab=" ")
c.write("for i in range(1000):\n")
c.indent()
c.write("print 'code generation is trivial'")
c.dedent()

illustrates three problems with the CodeGeneratorBackend class. 1) it
requires explicit \n on all lines (which the second omits, though it is
non-fatal since it is also the last) 2) the user still has to manually
match indents and dedents, and 3) the user *cannot* indent lines that
produce indented code.

The relatively new with statement and associated context managers are
designed, among other things, for this situation, where one needs to
alter and restore a global context. So here is my updated (3.1)
proof-of-concept version.

class PyCodeGen:
def __init__(self, tab=" "):
self.code = []
self.tab = tab
self.level = 0
# all attributes should be treated as read-only
def end(self):
return '\n'.join(self.code)
def line(self, string): # new line
self.code.append(self.tab * self.level + string)

class For:
def __init__(self, target, in_expression):
target.line('for ' + in_expression + ':')
self.target = target
def __enter__(self):
self.target.level += 1
def __exit__(self, t, v, tb):
self.target.level -= 1

c = PyCodeGen()

with For(c, 'i in range(1000)'):
c.line('print("Code gen is easy")')
c.line('# done')

print(c.end())

# prints

for i in range(1000):
print("Code gen is easy")
# done

Note that the absence of .indent and .dedent is intentional. In a
fleshed out system, there would be a context manager for each compound
statement and these would handle all indents and dedents.

If one really preferred to write, for instance, 'c.For(s); instead of
'For(c,s)' in the with statement, one could add a wrapper method like
def For(self, s): return For(self, s)
for each context manager. I left that out.

Similar methods can be used to auto-match C braces and other open/close
pairs.

Terry Jan Reedy





--
http://mail.python.org/mailman/listinfo/python-list


Re: Sumatra m/l?

2010-07-10 Thread Dan Goodman

On 10/07/2010 22:08, Neal Becker wrote:

Sumatra looks like an interesting project
http://pypi.python.org/pypi/Sumatra/0.2

But I have some questions.  Is there any mail list or forum?  I can't find
anything on the website.


It's part of Neural Ensemble which has a google group (not specifically 
devoted to Sumatra, but all their projects):


http://groups.google.com/group/neuralensemble

Dan

--
http://mail.python.org/mailman/listinfo/python-list