On Fri, 22 Nov 2013 22:33:29 -0800 Dan Stromberg <drsali...@gmail.com> wrote:
> On Fri, Nov 22, 2013 at 4:58 PM, John O'Hagan > <resea...@johnohagan.com>wrote: > > > On Thu, 21 Nov 2013 12:59:26 -0800 > > Dan Stromberg <drsali...@gmail.com> wrote: > > > > > On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan > > > <resea...@johnohagan.com>wrote: > > > > > > > > > > > Short story: the subject says it all, so if you have an answer > > > > already, fire away. Below is the long story of what I'm using it > > > > for, and why I think it needs to be recursive. It may even be of > > > > more general interest in terms of filtering the results of > > > > generators. > > > > > > > > > > I think you probably need permutations rather than combinations. > > > > > > Also, I think you'll need to form a word (partitioned off by > > > spaces), and then check it against a set > > > containing /usr/share/dict/words before recursing for the > > > remainder of the sentence - this should speed things up a LOT. > > > > Thanks for the reply. If I understand you correctly, you are > > suggesting permuting the input _characters_ to form words and then > > seeing if they exist, as opposed to my approach of combining known > > words and seeing if they are anagrams. (Permutations of words would > > not help find anagrams as they merely change the word order). Here > > is an attempt at that: > > > You've interpreted me correctly. > > However, I was thinking about this in the back of my mind, and > decided it would probably be best to inhale /usr/share/dict/words (if > on Linux), and pull out words of the corrects lengths (as separated > by the blanks) over the correct (possible) alphabet, and permute > Those, afterward checking if they form good anagrams of the original > sentence. This would probably be much faster, since English isn't > that dense of a space. If you look back at my original question, you'll see that's pretty much what I've done. I didn't spell out all the details, but I made a dictionary of wordlength keys containing lists of all dictionary words of that length made of the correct sub-alphabet. But to to recap the problem: to produce non-redundant anagram phrases, I need the cartesian product (not permutations) of these lists if each is unique, but with a subroutine producing multiset combinations if a list is repeated, i.e., if a particular length is called for more than once. The version I have so far is correct but substantially slower than the product-only one, which just goes ahead and produces all the redundant results. This seems counter-intuitive, and my theory is that this is because I am unable to "prune" the non-recursive combination algorithm I currently have. Regards, -- John -- https://mail.python.org/mailman/listinfo/python-list