READERS DIGEST CONDENSED QUESTION: How expensive is eval("123 + 456 == 975") 
versus other ways?

The discussion started by Jach has spread away from discussing how python deals 
with numbers starting with leading zeroes such as "03". I note there are many 
ID numbers like social security that have a leading zero so if converted to an 
int for storage reasons, ...

The TOPIC I want to discuss is the concerns about Jach wanting to use "eval()". 
Besides security concerns, I want to know if it is particularly expensive.

It was suggested that you might solve a range of problems like puzzles asking 
you to substitute unique digits for letters. Some of  these puzzles are  
normally solved with logic a step at a time but the techniques used may require 
a specific order and may not be easy to program for a more general case so can 
it be solved using what might be termed brute force. I mean try EVERYTHING that 
might work, including some that might not.

A particular example was: SEND + MORE = MONEY.

Jach's suggestion was to take every possible combination of digits and make the 
substitutions for the letters in that puzzle: then do something like this:

>>> eval ("9567+1085==10652")
True
>>> eval("123+456==975")
False

So, although better algorithms exist, no doubt, I considered what it would take 
to do this without an eval. I came up with the following as an outline.

Accept the puzzle as a string with any number of items to the right or left of 
an "=" as long as they only have whitespace and plus signs in addition to 
alphabetic characters. So "AE + DF = CE + AB + AA" would be fine. This model 
does not include doing subtraction or other things, just a simple model like 
that, for now.

You can easily remove whitespace, force everything to one case, split it into a 
left/right on "=" and then split those into lists on "+.

You now have two lists of alphabetic strings that ultimately must become lists 
of integers and then the sums can be compared.

To get the number of unique letters in the puzzle, N, you throw all the list 
items as individual characters into a set. Clearly, for this kind of puzzle, 
there cannot be more than ten unique letters or there can be no unique mapping 
for 0-9.

I would then "generate" all possible combinations of digits 0-9 of length N. 
There are an amazing number of ways to do that ranging from taking a 
range(10**N) and converting it to a string then a list of numeral characters 
then tossing out any that have any duplicates, to creating a recursive function 
that passes along what has been used so far to the next call. Since many of 
these problems only need ONE solution, the generator would only iterate as many 
times as needed and no giant lists of numbers or strings need be in memory at 
any time.

For each tuple returned  by the generator, you can iterate on the set of unique 
letters and use string functions to substitute the digits, or perhaps do this 
all at once. You would do this to all the items in what I call the left list as 
well as all the items on the right list. These would not be "numeric" so using 
int() on each item  would work EVEN with leading zeroes. Seems safe enough.

Finally, with no eval needed, you take the sum of the numbers in a list of the 
left and compare to the sum on the right and if equal, you present the solution 
in whatever way works. If no more solutions are needed, you quit. I might write 
this part as a generator too, that can be called once or to completion.

Yes, this is brute force. Using range(1,000,000) (no commas in real use) would 
be a million iterations when there are six unique characters in the puzzle and 
as many as 10 billion if all ten are in use. If you use nested loops like I 
showed a while ago (albeit to make arbitrary ones for many sizes could require 
multiple functions or use of an eval on one built by hand) you can cut down the 
number of iterations as the nested loops count down with each one doing one 
less than the one before it. Same goes for the recursive function call method 
as it passes along what numerals  have already been used. There may already be 
a permutation function that does this efficiently in C.

The above description includes parts that may also be used in the eval 
situation. The main difference may be that my method uses int() and perhaps 
some string functions. So, does eval just divert the attention of the 
interpreter as may happen with importing a module, or does it invoke a second 
copy of the interpreter as may happen with some multitasking methods? I would 
guess the former. But it may also have to first compile the text into byte 
code. However, a full eval is like using a sledgehammer when a thumbtack might 
be enough. Then again, it is already sitting there so a little extra use might 
be cheap.

So a main reason for my variant might still be to avoid taking any chances on 
rogue code. Mind you, in the above I would remove everything except [a-zA-Z=+] 
including parentheses and periods so damage from a carefully crafted text 
should be limited to over-riding a global variable name and even that can be 
handled by supplying a minimal environment to the eval call.

The real cost that dominates here is not memory, albeit garbage collection may 
be busy as it generates lots of temporary small bits of data. It is how the 
number of iterations grows.

I have looked at a somewhat related issue of how you generate all possible 
SCRABBLE words (or BOGGLE or ...) given specific starting letters. That problem 
allows duplicate letters but it has to handle quite large words (even weird 
medical terms like Pneumonoultramicroscopicsilicovolcanoconiosis but SCRABBLE 
words cannot possible be larger than the seven letters you have added to 
whatever is on the board that it connects to and in any case, the board can 
only fit 19 letters.) One way to make all possible combinations is along the 
lines above with many needed changes as there can (in theory) be as many as 26 
unique letters (in English) and you can have multiple copies of some letters. 
If you allow other languages, the problem grows to the point where brute force 
is not realistic. And, ideally, you winnow down your choices by checking each 
word against a large dictionary. Anyone know of one that has a decent selection 
and can be freely imported? I mean one word per line.

Of course, in this case, there is room for improvement such as by making a tree 
out of the dictionary and pruning any attempts if the word you are building 
wanders off a leaf. Another story.

I apologize for the length. The main question was whether eval is particularly 
expensive.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon....@python.org> On 
Behalf Of Antoon Pardon
Sent: Monday, December 10, 2018 5:00 AM
To: python-list@python.org
Subject: Re: Why Python don't accept 03 as a number?

On 8/12/18 06:00, Cameron Simpson wrote:
> On 07Dec2018 20:24, Jach Fong <jf...@ms4.hinet.net> wrote:
>> Ian at 2018/12/8 UTC+8 AM11:28:34 wrote:
>>> What is it exactly that you're trying to accomplish with this? 
>>> Perhaps there's a better way than using eval.
>>
>> This problem comes from solving a word puzzle,
>>    ab + aa + cd == ce
>> Each character will be translate to a digit and evaluate the 
>> correctness,
>>    03 + 00 + 15 == 18
>
> Then you should be evaluating the digits and assembling values from 
> them. Not trying to shoehorn a string through something that _might_ 
> accept this string and do what you want. In Python 2 it will accept 
> your string and not do what you want; at least in Python 3 it doesn't 
> accept your string.
>
> My point here is that the structure of your puzzle doesn't map 
> directly into a naive python statement, and you shouldn't be 
> pretending it might.

How do you figure? As far as I understand he is trying to solve this kind of 
puzzle:

    SEND
    MORE
 +  ————
   MONEY

Where every letter is to be replaced by a digit in such a way that the sum 
checks out. Sure trying to solve this by starting with the string: "SEND + MORE 
== MONEY" and doing replaces until eval comes up with true is not very 
sofisticated but I wouldn't call it shoehorning either.

--
Antoon.

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to