2011/2/22 Michael A Baikov <pa...@bk.ru> > > -----Original Message----- > > > Hi Maciej, > > > > Thanks for looking in to this. > > > > > After looking into problem (or rather onto your code) - the problem > have > > > nothing to do with iteratee-compress I believe. I get similar behaviour > > > and results when I replace "joinIM $ enumInflate GZip > > > defaultDecompressParams chunkedRead" by chunkedRead. (The memory is > > > smaller but it is due to decompression not iteratee fault). > > > > > > > This is due to "printLines". Whether it's a bug depends on what the > correct > > behavior of "printLines" should be. > > > > "printLines" currently only prints lines that are terminated by an EOL > > (either "\n" or "\r\n"). This means that it needs to hold on to the > entire > > stream received until it finds EOL, and then prints the stream, or drops > it > > if it reaches EOF first. In your case, the stream generated by > "convStream > > consChunk printLines" is just a stream of numbers without any EOL, where > the > > length is dependent on the specified block size. This causes the space > > leak. > > > > If I change the behavior of "printLines" to print lines that aren't > > terminated by EOL, the leak could be fixed. Whether that behavior is > more > > useful than the present, I don't know. Alternatively, if you insert some > > newlines into your stream this could be improved as well. > > > > As a result of investigating this, I realized that > > Data.Iteratee.ListLike.break can be very inefficient in cases where the > > predicate is not satisfied relatively early. I should actually provide an > > enumeratee interface for it. So thanks very much for (indirectly) > > suggesting that. > > Actually i can give you full sorce code - it uses also attoparsec-iteratee. > it leaks with iteratee-compress and works fine without it. > Whole idea - get bytestring from access.log, convert it to stream of data > object with usernames and bytes downliaded and then feed this stream into > iteratee which will collect all data into one big Map ByteString Integer. >
I'm not familiar with iteratee-compress, but you could be getting hit by Map's laziness. Instead of a map, could you use something like hashmap[1], bytestring-trie[2], or Johan's new containers library[3]? Also, I've recently posted a minor update to iteratee which includes an enumeratee version of break and an alternative to printLines that doesn't retain data, which you may find useful. Cheers, John [1] http://hackage.haskell.org/package/hashmap [2] http://hackage.haskell.org/package/bytestring-trie [3] http://hackage.haskell.org/package/unordered-containers
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe