Hello >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut <nalagin...@gmail.com> >>> wrote: >>>> >>>> hi guilers! >>>> It seems like there's no "regexp-split" procedure in Guile. >>>> What we have is "string-split" which accepted Char only. >>>> So I wrote one for myself. >>>> >>>> ------python code----- >>>> >>> import re >>>> >>> re.split("([^0-9])", "123+456*/") >>>> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] >>>> --------code end------- >>>> >>>> The Guile version: >>>> >>>> ----------guile code------- >>>> (regexp-split "([^0-9])" "123+456*/") >>>> ==>("123" "+" "456" "*" "" "/" "") >>>> ----------code end-------- >>>> >>>> Anyone interested in it? >>>>
Nice work! I have a couple of comments :-) The matched pattern/deliminator is included in the output: scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.") $21 = ("Words" ", " "words" ", " "words" "." "") scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.") $22 = ("Words" ", " "words" ", " "words" "." "") However, a user is not always interested in the deliminator. Consider the example given for string-split: scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:) $23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash") This behaviour can be obtained with list-matches on the complement of REGEXP. scheme@(guile-user)> (map match:substring (list-matches "\\w+" "Words, words, words.")) $24 = ("Words" "words" "words") I would like to see your version support the Python semantics [1]: > If capturing parentheses are used in pattern, then the text of > all groups in the pattern are also returned as part of the resulting > list. [...] > >>> re.split('\W+', 'Words, words, words.') > ['Words', 'words', 'words', ''] > >>> re.split('(\W+)', 'Words, words, words.') > ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split('((,)?\W+?)', 'Words, words, words.') ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, ''] For the sake of consistency with the rest of the module perhaps support the `flags' option (just pass it to fold-matches) and use the same variable names, etc.: (define* (regexp-split regexp string #:optional (flags 0)) ... instead of: (define regexp-split (lambda (regex str) ... Also, to me the name seems unintuitive -- it is STR being split, not RE -- perhaps this can be folded in to the existing string-split function. A nice patch none-the-less! [1] http://docs.python.org/library/re.html#re.split