On Friday 03 September 2010 14:04:26, JP Moresmau wrote: > Hello all > > After reading the modules docs and some other discussions, I'm still not > sure what's the best choice of tools for my problem. I'm looking at the > scion server code base. At the moment, it's reading and writing on > sockets using Lazy ByteStrings, then converting them to Haskell Strings > using utf8-string. The Haskell Strings are then parsed as JSON using the > JSon package. the response is in JSON, translated back with utf8-string > to ByteStrings. > This is efficient for small strings, but as I'm extending the API I have > calls with much more data, and performance degrades significantly. > Timings seem to point to the encoding of the String to UTF8. > I have replaced JSon by AttoJson (there was also JSONb, which seems > quite similar), which allows me to work solely with ByteStrings, > bypassing the calls to utf8-string completely. Performance has improved > noticeably. I'm worried that I've lost full UTF8 compatibility, though, > haven't I? No double byte characters will work in that setup?
That depends. I'm not familiar with JSON, but iirc, all delimiters are ASCII characters, so it could just work. > Is Data.Text an alternative? Can I use that everywhere, including for > dealing with sockets (the API only mentions Handle). Should I use > Data.ByteString.UTF8 everywhere, rewriting the JSON parser to deal with > this instead of the Word8 ByteStrings? Data.ByteString.UTF8 uses the ordinary Word8 ByteStrings, it just offers some functions to deal with UTF8 encoding. > In short, what's the fastest way to implement receiving/sending UTF8 > text across sockets? The fastest way of receiving/sending UTF8 text across sockets is, I strongly believe, ByteString. After all, UTF8 text is just a sequence of bytes (with special properties). It's what you do between receiving and sending where other methods might prove better. If you use Data.Text, you have to de/encode between UTF8 and UTF16 on receiving/sending. That won't be much faster than de/encoding between UTF8 and String, but Data.Text offers a better API for manipulating text than ByteString, so overall, it could be better. Depends on what your needs are, you'll have to try it out. > > Thanks for any pointer, _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
