I have a similar issue, I think. The problem with attoparsec is it only covers the unmarshalling side, writing data to disk still requires manually marshalling values into ByteStrings. Data.Binary with Data.Derive provide a clean, proven (encode . decode == id) way of doing this.
If there's a way to accomplish this with attoparsec, I'd love to know. Max On Jul 28, 2010, at 10:32 PM, Gregory Collins wrote: > Conrad Parker <con...@metadecks.org> writes: > >> Hi, >> >> I am reading data from a file as strict bytestrings and processing >> them in an iteratee. As the parsing code uses Data.Binary, the >> strict bytestrings are then converted to lazy bytestrings (using >> fromWrap which Gregory Collins posted here in January: >> >> -- | wrapped bytestring -> lazy bytestring >> fromWrap :: I.WrappedByteString Word8 -> L.ByteString >> fromWrap = L.fromChunks . (:[]) . I.unWrap > > This just makes a 1-chunk lazy bytestring: > > (L.fromChunks . (:[])) :: S.ByteString -> L.ByteString > > >> ). The parsing is then done with the library function >> Data.Binary.Get.runGetState: >> >> -- | Run the Get monad applies a 'get'-based parser on the input >> -- ByteString. Additional to the result of get it returns the number of >> -- consumed bytes and the rest of the input. >> runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64) >> >> The issue I am seeing is that runGetState consumes more bytes than the >> length of the input bytestring, while reporting an >> apparently successful get (ie. it does not call error/fail). I was >> able to work around this by checking if the bytes consumed > input >> length, and if so to ignore the result of get and simply prepend the >> input bytestring to the next chunk in the continuation. > > Something smells fishy here. I have a hard time believing that binary is > reading more input than is available? Could you post more code please? > > >> However I am curious as to why this apparent lack of bounds checking >> happens. My guess is that Get does not check the length of the input >> bytestring, perhaps to avoid forcing lazy bytestring inputs; does that >> make sense? >> >> Would a better long-term solution be to use a strict-bytestring binary >> parser (like cereal)? So far I've avoided that as there is >> not yet a corresponding ieee754 parser. > > If you're using iteratees you could try attoparsec + attoparsec-iteratee > which would be a more natural way to bolt parsers together. The > attoparsec-iteratee package exports: > > parserToIteratee :: (Monad m) => > Parser a > -> IterateeG WrappedByteString Word8 m a > > Attoparsec is an incremental parser so this technique allows you to > parse a stream in constant space (i.e. without necessarily having to > retain all of the input). It also hides the details of the annoying > buffering/bytestring twiddling you would be forced to do otherwise. > > Cheers, > G > -- > Gregory Collins <g...@gregorycollins.net> > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe