here's a interesting problem that we are discussing at comp.lang.lisp. 〈Parallel Programing Problem: asciify-string〉 http://xahlee.org/comp/parallel_programing_exercise_asciify-string.html
here's the plain text. Code example is emacs lisp, but the problem is general. for a bit python relevancy… is there any python compiler that's parallel-algorithm aware? ----------------------------------------------- Parallel Programing Problem: asciify-string Here's a interesting parallel programing problem. Problem Description The task is to change this function so it's parallelable. (code example in emacs lisp) (defun asciify-string (inputStr) "Make Unicode string into equivalent ASCII ones." (setq inputStr (replace-regexp-in-string "á\\|à\\|â\\|ä" "a" inputStr)) (setq inputStr (replace-regexp-in-string "é\\|è\\|ê\\|ë" "e" inputStr)) (setq inputStr (replace-regexp-in-string "í\\|ì\\|î\\|ï" "i" inputStr)) (setq inputStr (replace-regexp-in-string "ó\\|ò\\|ô\\|ö" "o" inputStr)) (setq inputStr (replace-regexp-in-string "ú\\|ù\\|û\\|ü" "u" inputStr)) inputStr ) Here's a more general description of the problem. You are given a Unicode text file that's a few peta bytes. For certain characters in the file, they need to be changed to different char. (For example of practical application, see: IDN homograph attack ◇ Duplicate characters in Unicode.) One easy solution is to simply use regex, as the above sample code, to search thru the file sequentially, and perform the transfrom of a particular set of chars, then repeat for each char chat needs to be changed. But your task is to use a algorithm parallelizable. That is, in a parallel-algorithm aware language (e.g. Fortress), the compiler will automatically span the computation to multiple processors. Refer to Guy Steele's video talk if you haven't seen already. See: Guy Steele on Parallel Programing. Solution Suggestions A better way to write it for parallel programing, is to map a char- transform function to each char in the string. Here's a pseudo-code in lisp by Helmut Eller: (defun asciify-char (c) (case c ((? ? ? ?) ?a) ((? ? ? ?) ?e) ((? ? ? ?) ?i) ((? ? ? ?) ?o) ((? ? ? ?) ?u) (t c))) (defun asciify-string (string) (map 'string #'asciify-string string)) One problem with this is that the function “asciify-char” itself is sequential, and not 100% parallelizable. (we might assume here that there are billions of chars in Unicode that needs to be transformed) It would be a interesting small project, if someone actually use a parallel-algorithm-aware language to work on this problem, and report on the break-point of file-size of parallel-algorithm vs sequential- algorithm. Anyone would try it? Perhaps in Fortress, Erlang, Ease, Alice, X10, or other? Is the Clojure parallel aware? Xah -- http://mail.python.org/mailman/listinfo/python-list