Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-29 Thread Stefan Behnel
Dun Peal, 28.10.2010 09:10: I find myself surprised at the relatively little use that Cython is seeing. I don't think it's being used that little. It just doesn't show that easily. We get a lot of feedback on the mailing list that suggests that it's actually used by all sorts of people in all

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-28 Thread Dun Peal
On Wed, Oct 20, 2010 at 6:52 AM, Stefan Behnel wrote: > Well, the estimate is about one man-month, so it would be doable in about > three months time if we had the money to work on it. So far, no one has made > a serious offer to support that project, though. I find myself surprised at the relati

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-19 Thread Stefan Behnel
Dun Peal, 20.10.2010 02:07: On Mon, Oct 18, 2010 at 1:41 AM, Stefan Behnel wrote: Or, a bit shorter, using Cython 0.13: def only_allowed_characters(list strings): cdef unicode s return any((c< 31 or c> 127) for s in strings for c in s) Very cool, this

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-19 Thread Dun Peal
On Mon, Oct 18, 2010 at 1:41 AM, Stefan Behnel wrote: > Or, a bit shorter, using Cython 0.13: > >    def only_allowed_characters(list strings): >        cdef unicode s >        return any((c < 31 or c > 127) >                   for s in strings for c in s) Very cool, this caused me to look up the

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Felipe Bastos Nunes
Printable in the screen, all of them are, except for blank spaces ehhehehe 2010/10/18, Tim Chase : > On 10/18/10 09:28, Grant Edwards wrote: >> There's no easy way to even define what "printable" means. Ask three >> different people, and you'll get at least four different answers >> answers. > >

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Tim Chase
On 10/18/10 09:28, Grant Edwards wrote: There's no easy way to even define what "printable" means. Ask three different people, and you'll get at least four different answers answers. I don't have a printer...that makes *all* characters unprintable, right? Now I can convert the algorithm to O

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-18 Thread Grant Edwards
On 2010-10-18, Steven D'Aprano wrote: > Neither is accurate. all_ascii would be: > > all(ord(c) <= 127 for c in string for string in L) Definitely. > all_printable would be considerably harder. As far as I can tell, there's > no simple way to tell if a character is printable. There's no easy

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Stefan Behnel
Dun Peal, 17.10.2010 21:59: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are: 1. Match against a regexp with a character ran

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Albert Hopkins
On Sun, 2010-10-17 at 14:59 -0500, Dun Peal wrote: > `all_ascii(L)` is a function that accepts a list of strings L, and > returns True if all of those strings contain only ASCII chars, False > otherwise. > > What's the fastest way to implement `all_ascii(L)`? > > My ideas so far are: > > 1. Matc

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Steven D'Aprano
On Mon, 18 Oct 2010 01:04:09 +0100, Rhodri James wrote: > On Sun, 17 Oct 2010 20:59:22 +0100, Dun Peal > wrote: > >> `all_ascii(L)` is a function that accepts a list of strings L, and >> returns True if all of those strings contain only ASCII chars, False >> otherwise. >> >> What's the fastest w

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Tim Chase
On 10/17/10 19:04, Rhodri James wrote: import string return set("".join(L))<= set(string.printable) I've no idea whether this is faster or slower than any of your suggestions. For set("".join(L)) to return, it has to scan the entire input list/string. Imagine s = UNPRINTABLE_CHAR

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Carl Banks
On Oct 17, 12:59 pm, Dun Peal wrote: > `all_ascii(L)` is a function that accepts a list of strings L, and > returns True if all of those strings contain only ASCII chars, False > otherwise. > > What's the fastest way to implement `all_ascii(L)`? > > My ideas so far are: > > 1. Match against a rege

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Rhodri James
On Sun, 17 Oct 2010 20:59:22 +0100, Dun Peal wrote: `all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are: 1. Match against a r

Re: Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Seebs
On 2010-10-17, Dun Peal wrote: > What's the fastest way to implement `all_ascii(L)`? Start by defining it. > 1. Match against a regexp with a character range: `[ -~]` What about tabs and newlines? For that matter, what about DEL and BEL? Seems to me that the entire 0-127 range are "ASCII char

Fastest way to detect a non-ASCII character in a list of strings.

2010-10-17 Thread Dun Peal
`all_ascii(L)` is a function that accepts a list of strings L, and returns True if all of those strings contain only ASCII chars, False otherwise. What's the fastest way to implement `all_ascii(L)`? My ideas so far are: 1. Match against a regexp with a character range: `[ -~]` 2. Use s.decode('a