This came up as I was doing homework for natural language processing. I'm constructing a trigram model from training data, but I also need the bigram and unigram counts. I could, for each triple of characters, add the 3 tuple to a trigram map and increment its count (I know I'm not actually mutating anything), add the last two letters to a bigram map, and add the last character to a unigram map.
But this looks very much like a tree... each node contains the character (the key) and list of characters that can appear after it. (I think this is a trie, but I don't know very much about tries.) The problem is that lookup is more horribly inefficient than I can stand, if I use lists. As for what constitutes convenient, I haven't thought about it that much; I just noted that my naive attempt was full of boilerplate and decided that since there was already python code provided I used that. I'm planning to port the code *after* I have the assignment finished. On Fri, Oct 8, 2010 at 11:18 PM, wren ng thornton <w...@freegeek.org> wrote: > On 10/8/10 5:46 PM, Thomas DuBuisson wrote: > >> Alex, >> >> The containers library can do this already - there are no constraints >> on the elements of a Map. For example: >> >> type TripleNestedMap a = Map Int (Map Char (Map String a)) >>> >> >> But this is rather silly as you can just do: >> >> type MapOfTriples a = Map (Int ,Char, String) a >>> >> >> for most uses. >> > > However, Map is a lot less efficient than IntMap when dealing with Ints. > And the IntMap (IntMap ... a) type requires you to write (lookup m <=< ... > <=< lookup n) instead of just one lookup. Unfortunately, when you're > interested in performance issues, the standard tricks for implementing > polyvariadic functions aren't very useful. > > FWIW, the monadic combinators are usually sufficient to create your own > functions legiblely (e.g., using (<=<) for lookup), but it's still a lot > noiser than it could be--- especially if you want a trie instead of a > product map, since you have to add fst and snd everywhere. I've been playing > around with some ad-hoc tries like these a lot lately, both for HMM tagging > and for hunting down performance issues in bytestring-trie. > > -- > Live well, > ~wren > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > -- Alex R
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe