On 2018-07-25 20:23, Mats Wichmann wrote: > On 07/25/2018 05:50 PM, Jim wrote: >> Linux mint 18 and python 3.6 >> >> I have a list of strings that contains slightly more than a million >> items. Each item is a string of 8 capital letters like so: >> >> ['MIBMMCCO', 'YOWHHOY', ...] >> >> I need to check and see if the letters 'OFHCMLIP' are one of the items >> in the list but there is no way to tell in what order the letters will >> appear. So I can't just search for the string 'OFHCMLIP'. I just need to >> locate any strings that are made up of those letters no matter their order. >> >> I suppose I could loop over the list and loop over each item using a >> bunch of if statements exiting the inner loop as soon as I find a letter >> is not in the string, but there must be a better way. >> >> I'd appreciate hearing about a better way to attack this. > It's possible that the size of the biglist and the length of the key has > enough performance impacts that a quicky (untested because I don't have > your data) solution is unworkable for performance reasons. But a quicky > might be to take these two steps: > > 1. generate a list of the permutations of the target > 2. check if any member of the target-permutation-list is in the biglist. > > Python sets are a nice way to check membership. > > from itertools import permutations > permlist = [''.join(p) for p in permutations('MIBMMCCO', 8)] > > if not set(permlist).isdisjoint(biglist): > print("Found a permutation of MIBMMCCO") >
I would *strongly* recommend against keeping a list of all permutations of the query string; though there are only 8! = 40320 permutations of 8 characters, suggesting anything with factorial runtime should be done only as a last resort. This could pretty effectively be solved by considering each string in the list as a set of characters for query purposes, and keeping a set of those, making membership testing constant-time. Note that the inner sets will have to be frozensets because normal sets aren't hashable. For example: """ In [1]: strings = ['MIBMMCCO', 'YOWHHOY'] In [2]: query = 'OFHCMLIP' In [3]: search_db = {frozenset(s) for s in strings} In [4]: frozenset(query) in search_db Out[4]: False In [5]: frozenset('MMCOCBIM') in search_db # permutation of first string Out[5]: True """ MMR... _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor