Re: Pimp my algorithm: finding repeating subsequences

2011-02-21 Thread Jules
Sorry that should be: (def represent (memoize represent)) On Feb 20, 5:18 pm, Jules wrote: > Here is my solution that solves the problem optimally. Observe that > given a sequence we can do one of two things: (1) represent it as a > repetition or (2) represent it as a concatenation of repetition

Re: Pimp my algorithm: finding repeating subsequences

2011-02-20 Thread Jules
Here is my solution that solves the problem optimally. Observe that given a sequence we can do one of two things: (1) represent it as a repetition or (2) represent it as a concatenation of repetitions. If we had two functions to calculate all the ways of doing (1) and (2) we could solve the problem

Re: Pimp my algorithm: finding repeating subsequences

2011-02-19 Thread Mark Fredrickson
Very interesting solution. I implemented an LZ77 like solution, which appears to be working, but its good to have alternative solutions. As you point out, different match patterns might be interesting. I tune the matching by disallowing small matches based on length, but the pattern string would be

Re: Pimp my algorithm: finding repeating subsequences

2011-02-19 Thread Tyler Perkins
Interesting. This problem is like compression, but the first thing I thought of was pattern recognition. If your tokens are just characters in a string, you can let regular expressions do all the heavy lifting: (defn rep-groups [s] (re-seq #"(.+)\1+|." s)) Then (rep-groups ".,ababcababc.,cc,.abab

Re: Pimp my algorithm: finding repeating subsequences

2011-02-13 Thread Mark Fredrickson
Thanks for the hint on LZ77. That was exactly what I was looking for. Thank you also to the others who provided this and other answers. Thanks! -Mark On Feb 13, 1:00 pm, Mark Engelberg wrote: > This is essentially a compression problem.  I think you want to research the > topic of "Run-length

Re: Pimp my algorithm: finding repeating subsequences

2011-02-13 Thread Fred Concklin
DEFLATE is a compression method that combines LZ77 with Huffman coding. It is implemented in java.util.zip. Read about it here: http://download.oracle.com/javase/1.4.2/docs/api/java/util/zip/package-summary.html fpc -- You received this message because you are subscribed to the Google Groups "

Re: Pimp my algorithm: finding repeating subsequences

2011-02-13 Thread Andy Fingerhut
On Feb 13, 2011, at 10:11 AM, Mark Fredrickson wrote: Hello friends, I am writing a program to generate directions for humans to read. The directions are composed of a collection with a series of steps, some of which may be duplicated: [:a :b :c :a :b :c] I would like to compress them by ind

Re: Pimp my algorithm: finding repeating subsequences

2011-02-13 Thread Mark Engelberg
This is essentially a compression problem. I think you want to research the topic of "Run-length encoding", and then look at algorithms like LZ77 which handle runs of repeated characters. LZ77 finds repeated blocks that are some distance apart; so, for example, you could restrict the algorithm to

Pimp my algorithm: finding repeating subsequences

2011-02-13 Thread Mark Fredrickson
Hello friends, I am writing a program to generate directions for humans to read. The directions are composed of a collection with a series of steps, some of which may be duplicated: [:a :b :c :a :b :c] I would like to compress them by indicating repeats, using this notation: [[2 [:a :b :c]] Si