On Saturday, 30 May 2015 06:39:44 UTC+10, Nick Mellor wrote: > Hi all, > > My own solution works but I'm sure it could be simpler or read better. How > would you do it? > > Say you've got a list of companies: > > Aerosonde Ltd > Amcor > ANCA > Austal Ships > Australia Post > Australian Air Express > Australian Defence Industries > Australian Railroad Group > Australian Submarine Corporation > > and you need to extract phrases from the company names that uniquely identify > that company. The results for the above list of companies should be: > > Company: 'Aerosonde Ltd' > Aliases: Aerosonde,Ltd,Aerosonde Ltd > > Company: 'Amcor' > Aliases: Amcor > > Company: 'ANCA' > Aliases: ANCA > > Company: 'Austal Ships' > Aliases: Austal,Ships,Austal Ships > > Company: 'Australia Post' > Aliases: Post,Australia Post > > Company: 'Australian Air Express' > Aliases: Air,Express,Australian Air,Air Express,Australian Air Express > > Company: 'Australian Defence Industries' > Aliases: Defence,Industries,Australian Defence,Defence Industries,Australian > Defence Industries > > Company: 'Australian Railroad Group' > Aliases: Railroad,Group,Australian Railroad,Railroad Group,Australian > Railroad Group > > Company: 'Australian Submarine Corporation' > Aliases: Submarine,Corporation,Australian Submarine,Submarine > Corporation,Australian Submarine Corporation > > Here's my solution: > > from itertools import combinations, chain > > companies = [ > "Aerosonde Ltd", > "Amcor", > "ANCA", > "Austal Ships", > "Australia Post", > "Australian Air Express", > "Australian Defence Industries", > "Australian Railroad Group", > "Australian Submarine Corporation", > ] > > def flatten(i): > return list(chain.from_iterable(i)) > > companies_as_text_stream = ' '.join(companies) > for company in companies: > word_combinations = [list(combinations(company.split(), r)) for r in > range(1, len(company))] > phrases = [' '.join(phrase) for phrase in flatten(word_combinations)] > unique_phrases = [phrase for phrase in phrases if > companies_as_text_stream.count(phrase) == 1] > aliases = ','.join(unique_phrases) > print("Company: '{0}'\n Aliases: {1}\n".format(company, aliases))
Great reply, Peter, thank you. Lots to think about. Cheers, Nick -- https://mail.python.org/mailman/listinfo/python-list