On Monday 21 December 2015 15:22, Chris Angelico wrote:

> On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano <st...@pearwood.info>
> wrote:
>> I have a large number of strings (originally file names) which tend to
>> fall into two groups. Some are human-meaningful, but not necessarily
>> dictionary words e.g.:
[...]

> The first thing that comes to my mind is poking the string into a
> search engine and seeing how many results come back. You might need to
> do some preprocessing to recognize multi-word forms (maybe a handful
> of recognized cases like snake_case, CamelCase,
> CamelCasewiththeLittleWordsLeftUnchanged, etc),

I could possibly split the string into "words", based on CamelCase, spaces, 
hyphens or underscores. That would cover most of the cases.

> How many of these keywords would you be looking up, and would a
> network transaction (a search engine API call) for each one be too
> expensive?

Tens or hundreds of thousands of strings, and yes a network transaction 
probably would be a bit much. I'd rather not have Google or Bing be a 
dependency :-)


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to