This is essentially a compression problem. I think you want to research the topic of "Run-length encoding", and then look at algorithms like LZ77 which handle runs of repeated characters. LZ77 finds repeated blocks that are some distance apart; so, for example, you could restrict the algorithm to look at distances which match the block size, to locate the blocks that immediately follow one another.
I think you're right that finding the truly optimal instruction set would be hopelessly time consuming, but it should be possible to draw inspiration from compression algorithms and write a program that finds a satisfyingly good instruction set in polynomial time. I hope that gives you some ideas to explore! -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en