On Mon, Mar 11, 2013 at 4:53 AM, Will Crawford <[email protected]>wrote:
> While it's not Catalyst's fault, I've found over the years that > interacting with underlying libraries, databases and legacy systems is > generally easier when I *don't* try to force anything. I have custom code > in place to deal with know sources of inconsistent encodings (check to see > if it's valid UTF8, look up the remainder in a table painstakingly > assembled over a period of time that catches a few odd MacRoman characters > that show up in some of our contributors' data, fall back to latin1 or > cp1252 for the remainder, leave anything else as \xNN). Everywhere else, > UTF8 can be passed through quite transparently, so I don't really see the > point of adding extra decoding and encoding all over the place to switch > from utf8, to some internal wide character encoding, then back to utf8 > again for output. One of the positive features of UTF8 has always been that > code that doesn't need to identify any of those fancy accented characters > can just treat it the same as ASCII, Latin-$WHATEVER or cp1252 without any > overhead. Overall I can't see the point of forcing everything to be > converted multiple times ... > > I think we can all agree that historically encoding has been confusing, misunderstood, and frequently ignored. And very often just done plain wrong. I suspect since this is currently a plugin that it's often ignored, especially by newer developers. That means "out of the box" Catalyst, as a web framework, for its typical use, is broken. One of the typical uses for a Catalyst application is building a web app that outputs character data. This character data must be encoded when sent over the wire. Likewise, request data that is character data must be decoded. Doing these as close to the "edge" of the application as possible is the best approach. That's what the plugin does. As t0m says, if you are ignoring encoding your app is broken. Sure, it may not seem so. Sure, you can ignore those "fancy accented characters" and if you app only works in an something like ASCII never notice -- it's just like before Unicode support was added to Perl. And you still should set a charset on the content-type when you send the response, so what are you going to set it to? Plus, once you do get some of those fancy characters (used by billions of people) into your app then all those length() and every other thing that works with characters (hey, this is Perl) will be broken. My wild guess from your description above is you are not handling encoding correctly. But, in the real world you get character data thrown at you that is broken in some way. Perhaps your input is so broken you have to do what you described. (Still, I think the correct approach is to decode() with a useful CHECK value.) If you are "passing through" UTF8 undecoded then unless you not touching that input (as character data) then that's broken. You say it's best not to force anything, which I assume you mean force as some encoding. If you have character input then by nature it's encoded. You have to know what the encoding is, and decode it as such and be prepared for bad data. You wouldn't ignore it if it was base64 or gzipped, right? Those are not character encodings, but it's essentially the same issue. I have never considered any performance aspect of this. It never shows up when we profile "slow" responses. Plus, it's never been an optional operation. We manipulate characters and we exchange data as bytes. You have to convert between those. The plugin should be core to Catalyst. It think it's pretty safe to add it if it only encodes if the utf8 flag is set on the body -- that should prevent double-encodings. And having a config option to disable is easy. And if the plugin is found on the app issue a warning. It's possible that someone has their own modified version of the plugin using the same name. -- Bill Moseley [email protected]
_______________________________________________ List: [email protected] Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[email protected]/ Dev site: http://dev.catalyst.perl.org/
