Peter Green wrote: > Heinrich, thanks for some great help and food for thought!
My pleasure. :) >> Thanks for describing the background of this problem in detail! I was >> mainly asking because I'm always looking for interesting Haskell topics >> that can be turned into a tutorial of sorts, and this problem makes a >> great example. > > Another interesting problem is starting from a file of single wagers and > trying to compress them (i.e. inverse of 'explosion'. I believe this > problem is analogous to Set Cover and therefore NP-Complete. Heuristics > are the order of the day here. Interesting indeed! What kind of heuristics do you employ there? It seems quite difficult to me, somehow. >> [...] >> main = interact (unlines . toCSV . sortExplode . fromCSV . lines) >> > Thank you *very* much for this code! I will try to get my head around > it. I understand the broad outline you posted elsewhere, but will take > me a while to fully grasp the above as I'm only up to ~p200 in Real > World Haskell :). > > As for performance of your code above on my file of compressed wagers > which expands to 3.7M single wagers: > > (My original version posted here and using vanilla sort) > 541 MB total memory in use (5 MB lost due to fragmentation) > INIT time 0.00s ( 0.01s elapsed) > MUT time 13.98s ( 15.82s elapsed) > GC time 8.69s ( 9.64s elapsed) > EXIT time 0.00s ( 0.00s elapsed) > Total time 22.67s ( 25.47s elapsed) > > (Your much improved version) > 10 MB total memory in use (1 MB lost due to fragmentation) > INIT time 0.00s ( 0.00s elapsed) > MUT time 7.61s ( 9.38s elapsed) > GC time 3.48s ( 3.58s elapsed) > EXIT time 0.00s ( 0.00s elapsed) > Total time 11.08s ( 12.95s elapsed) > > Very impressive and thanks again! Category theory saves the day. ;) The code is based on three observations: 1) Sorting and exploding can be interleaved to create version that can stream results in an on-line fashion. 2) There is a standard formulation of this pattern in terms of folds and unfolds (= catamorphisms and anamorphisms). 3) Lazy evaluation can be used to make this look like an off-line algorithm. To my surprise, I don't have good references for these, but maybe the following can help a bit. Something similar to 1) can be found in Graham Hutton. The countdown problem. http://www.cs.nott.ac.uk/~gmh/bib.html#countdown and Richard Bird. A program to solve Sudoku. http://www.cs.tufts.edu/~nr/comp150fp/archive/richard-bird/sudoku.pdf The standard reference for 2) is Meijer et al. Functional programming with bananas, lenses, envelopes and barbed wire. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.125 and 3) is folklore, maybe John Hughes. Why functional programming matters. http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html could serve as an introduction to lazy evaluation. >> Note that there's also the possibility of not expanding the compressed >> wagers at all, and perform set operations directly. For instance, it is >> straightforward to intersect two such sets of size n in O(n^2) time. >> Since n ~ 5000 , n^2 is about the same ballpark as the exploded single >> wager combinations. > > Not sure I quite understand you here. In my mind, set elements *are* > single combinations. It is possible for two quite different-looking > files of compressed wagers to contain exactly the same single wager > elements. So I'm not sure how to compare without some notion of > explosion to single combinations. What I mean is that there are two ways to represent sets of wagers: * A list of single combinations -> [Row a] * A list of compressed wagers -> [Row [a]] To perform set operations, you first convert the latter into the former. But the idea is that there might be a way of performing set operations without doing any conversion. Namely, the compressed wagers are cartesian products which behave well under intersection: the intersection of two compressed wagers is again a compressed wager intersectC :: Eq a => Row [a] -> Row [a] -> Row [a] intersectC [] [] = [] intersectC (xs:xrow) (ys:yrow) = intersect xs ys : intersectC xrow yrow In formulas: intersect (cartesian x) (cartesian y) = cartesian (intersectC x y) for compressed wagers x, y :: Row [a] with intersect = list intersection, for instance Data.List.intersect cartesian = converts a compressed wager to a list of single combinations This allows you to intersect two lists of compressed wagers directly intersect' :: [Row [a]] -> [Row [a]] -> [Row [a]] intersect' xs ys = filter (not . null) $ [intersectC x y | x<-xs, y<-ys] Unfortunately, the result will have quadratic size; but if you have significantly less compressed wagers than single combinations, this may be worth a try. Not to mention that you might be able to compress the result again. Mathematically, this exploits the equations (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D) "The intersection of two rectangles is another rectangle" and (A ∪ B) ∩ (C ∪ D) = (A ∩ C) ∪ (A ∩ D) ∪ (B ∩ C) ∪ (B ∩ D) where A,B,C,D are sets. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
