On Tue, Dec 11, 2018 at 1:12 PM Avi Gross <avigr...@verizon.net> wrote:
> I would then "generate" all possible combinations of digits 0-9 of length N. 
> There are an amazing number of ways to do that ranging from taking a 
> range(10**N) and converting it to a string then a list of numeral characters 
> then tossing out any that have any duplicates, to creating a recursive 
> function that passes along what has been used so far to the next call. Since 
> many of these problems only need ONE solution, the generator would only 
> iterate as many times as needed and no giant lists of numbers or strings need 
> be in memory at any time.
>

First off, you can slash that number significantly, as there are only
10! (not 10**10) possibilities. Secondly, you can eliminate zero from
consideration from any letter that starts a sequence other than a
single digit, as demonstrated previously. So there are 9! * x, where x
is the number of letters that could potentially be zero. That cuts
your total possibilities from 10,000,000,000 down to, at worst,
3,628,800.

> For each tuple returned  by the generator, you can iterate on the set of 
> unique letters and use string functions to substitute the digits, or perhaps 
> do this all at once. You would do this to all the items in what I call the 
> left list as well as all the items on the right list. These would not be 
> "numeric" so using int() on each item  would work EVEN with leading zeroes. 
> Seems safe enough.
>

Use str.replace() and int(), job done.

> Yes, this is brute force. Using range(1,000,000) (no commas in real use)

Use underscores instead:

>>> range(1_000_000)
range(0, 1000000)

> would be a million iterations when there are six unique characters in the 
> puzzle and as many as 10 billion if all ten are in use. If you use nested 
> loops like I showed a while ago (albeit to make arbitrary ones for many sizes 
> could require multiple functions or use of an eval on one built by hand) you 
> can cut down the number of iterations as the nested loops count down with 
> each one doing one less than the one before it. Same goes for the recursive 
> function call method as it passes along what numerals  have already been 
> used. There may already be a permutation function that does this efficiently 
> in C.
>

Yeah, check itertools :)

> The real cost that dominates here is not memory, albeit garbage collection 
> may be busy as it generates lots of temporary small bits of data. It is how 
> the number of iterations grows.

Correct. And that's why a pure brute-force solution needs some
refinement. Algorithmic improvements almost always trump mechanical
improvements.

> I have looked at a somewhat related issue of how you generate all possible 
> SCRABBLE words (or BOGGLE or ...) given specific starting letters.
> One way to make all possible combinations is along the lines above with many 
> needed changes as there can (in theory) be as many as 26 unique letters (in 
> English) and you can have multiple copies of some letters. If you allow other 
> languages, the problem grows to the point where brute force is not realistic.
> And, ideally, you winnow down your choices by checking each word against a 
> large dictionary.

Hmm, IMO that's the wrong way around. Instead, *start* with the
dictionary, and winnow down the possibilities to the words you have. A
decent word list will probably have 100K-1M words in it, which is a
small enough corpus to go through them all.

> Anyone know of one that has a decent selection and can be freely imported? I 
> mean one word per line.
>

What OS are you on? For my usage, I just read from
/usr/share/dict/words and it's usually there. On Debian Linux, that's
provided by the dictionaries-common package, or the wbritish or
wamerican Ditch-side-specific packages. In fact, using that file makes
your code independent of language (although you may need to concern
yourself with different alphabets if you want to support
Russian/Greek, and anything involving "letters" doesn't really work
with Chinese), so I would strongly recommend it.

On Windows, where that path doesn't exist, and there probably aren't
standard dictionaries, you could download one of them from
wordlist.aspell.net or wordlist.sourceforge.net - they're freely
available, but you would have to go fetch them.

> I apologize for the length. The main question was whether eval is 
> particularly expensive.

Well, yes it IS expensive... but the cost of it is less significant
than the algorithm and consequent number of iterations/attempts. Using
eval() on three million possibilities is going to be WAY cheaper than
a more efficient calculation technique used on ten billion.

Write your code with two main priorities:

1) Get your algorithm right
2) Express your algorithm cleanly in code.

Don't worry about performance until you've done the above two steps,
*measured*, and found that it's taking too long.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to