REAL SUBJECT: Analysis of alternate algorithms.
Peter & Jach and anyone interested,
As Peter said in his altered subject line, Jack changed directions from
tweaking an algorithm to trying something quite different.
Reminder of the problem.
Horizontal View:
SEND + MORE = MONEY
Vertical View:
SEND
+MORE
...
MONEY
Hard to be precise as I am sticking to plain text in my message. The three
words above are meant to be right adjusted.
When solving it horizontally, Jach and I used variants of a brute force
approach. Substitute all possible combinations. He did it in-line and used
eval. I did it by making lists of items on both sides and summing the int() of
each component and comparing. We tried both our algorithms and his was slower
and he profiled that the cause was that eval() was much more expensive as were
his use of regular expression functions. For comparison, mine used int() and
string manipulation functions and sets and dictionaries.
But the real puzzle is meant to be solved in a more vertical way by humans
using logic. I won't do that here but note we have 4 equations going down but 8
unknowns. And the equations are not unique.
The rightmost column (I will call it the first as our arithmetic proceeds from
right to left) is meant to represent ONES and provides the equation:
(D+E== Y) or (D+E == Y + 10)
Worse, for the next column, you either get a "1" carried from the previous
addition or not and either pass a "1" along to the next column or not. 4
Possibilities.
(N+R==E) or (N+R+1==E) or (N+R==E+10) or (N+R+1==E+10)
Getting a headache yet?
For a human, they need a way to come up with additional info in terms of
constraints.
There is a guiding inequality that looks like this:
S is not the same as any of the others. Anytime you solve for another, the list
of possible values for S shrinks.
Ditto for each other variable.
Or, since anything plus 0 is itself, then D and E adding up to Y (with no
possible carry) cannot be 0.
But getting a computer to do this might be a bit harder than blunt-force
searches. So let's see why Jach's new algorithm was faster.
The code I am analyzing can be viewed in the archives and will not be entered
again:
https://mail.python.org/pipermail/python-list/2018-December/738454.html
So what did Jach try in his newer version? It is what I call a vertical
approach but one a computer can do easier than a human can or would. I see it
is a very specific algorithm that hard codes in these variables as global
entities that are altered by a set of nested functions. S, E, N, D, M, O, R, Y.
There are approaches that might be better such as passing a dictionary
partially filled out from one function to the next as the only one that prints
the solution is the final function call.
So his is not a general solution.
What interests me as a probable reason this is faster is the number of
situations it tries. The earlier versions asked itertools.permutations() to
provide all unique combinations of ten tokens in eight positions. So there were
10 choices for the first and 9 for the second and so on adding up to 10!/2! or
1,814,400 different combinations tried. That approaches 2 million.
Jack broke the problem down into evaluating the units column with a loop like
this:
itertools.permutations(range(10), 3)
That is 720 possibilities. He then doubled that to 1,440 to consider a carry.
Only if the selected values for the three variables in contention (plus a
carry) does he go on to call to evaluate the tens column.
It then shrinks a bit more as he no longer gets the permutations of all 10
digits. He subtracts the three values that might work for the units, so he is
asking for permutations of 7 digits, two at a time. That is 42, doubled again
to 84 for carry purposes. And this function is not called 1,440 times, but
quite a bit fewer.
So, similarly, of those 84 loops for tens, he only sometimes calls to evaluate
hundreds. As mentioned, the set of 10 digits shrinks some more and this
continues upward to functions that evaluate hundreds and thousands and finally
the one evaluating ten thousands pretty much prints out an answer it finds.
So overall iterations can be shown to drop. We could add code to measure how
many times each function is called and come up with an exact value for this
built-in problem. I did and the functions were called this many times:
>>> counting
{'unit': 1, 'ten': 72, 'hundred': 290, 'thou': 183, '10thou': 196}
>>> sum(counting.values())
742
But I also see the permutation function was called 542 times. Most of those
runs though were fairly trivial as the combined number of items issued back
were only 7,390 as compared to nearly two million in the earlier version.
Overall, this is much faster.
Range() is probably a highly efficient built-in written in C, but it can
totally be eliminated in this code as it is a constant. You can write
{1,2,2,3,4,5,6,7,8,9} or [0,1] instead of calculating ranges.
Natu