ekmett: > Hi Anakim, > > Nice to see someone else working in this space. > > I have also been working on a set of parallel parsing techniques, which can > use > small Parsec parsers for local context sensitivity. > > See the second set of slides in http://comonad.com/reader/2009/ > iteratees-parsec-and-monoid/ for an overview of how I'm doing something > similar > to feed Parsec independent chunks. Note that this approach bypasses the need > for a separate sequential scan, which otherwise floods your cache, and lets > you > get closer to the performance limit imposed by Amdahl's law. > > The code in the second set of slides can be adapted to your case: load > everything into a lazy bytestring or fingertree of strict bytestrings, then > for > each strict bytestring chunk in parallel, scan it for the first newline, and > then start an iteratee based parsec parser from that point. I use the iteratee > based parsec parsers so that when I glue the partial parses together I can > feed > the unparsed data on the left side of the first newline in each chunk to the > parser I'm joining on the left. I provide a monoid for the purpose of gluing > together these partial parses, which encapsulates this behavior.
You can get quite a long way by using bytestring-mmap and strict bytestrings. The first ensures your IO overhead will be low, and the second ensures you don't migrate work needlessly. _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
