Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
hi Daniel! Very glad to see your reply.
1. I also think the order: (regexp str) is strange. But it's according to
python version.
And I think the 'string-match' also put regexp before str. Anyway, that's
an easy mend.
2. I think it's a little different to implement a flag as python version.
Since "ignorecase" flag must
be passed to make-regexp. So we can't use fold-matches.
Hmm...let me see what I can do...

On Fri, Dec 30, 2011 at 1:34 PM, Daniel Hartwig  wrote:

> Hello
>
> >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut 
> >>> wrote:
> 
>  hi guilers!
>  It seems like there's no "regexp-split" procedure in Guile.
>  What we have is "string-split" which accepted Char only.
>  So I wrote one for myself.
> 
>  --python code-
>  >>> import re
>  >>> re.split("([^0-9])", "123+456*/")
>  [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>  code end---
> 
>  The Guile version:
> 
>  --guile code---
>  (regexp-split "([^0-9])"  "123+456*/")
>  ==>("123" "+" "456" "*" "" "/" "")
>  --code end
> 
>  Anyone interested in it?
> 
>
> Nice work!  I have a couple of comments :-)
>
>
> The matched pattern/deliminator is included in the output:
>
> scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.")
> $21 = ("Words" ", " "words" ", " "words" "." "")
> scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.")
> $22 = ("Words" ", " "words" ", " "words" "." "")
>
> However, a user is not always interested in the deliminator.  Consider
> the example given for string-split:
>
> scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
> $23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash")
>
> This behaviour can be obtained with list-matches on the complement of
> REGEXP.
>
> scheme@(guile-user)> (map match:substring
>  (list-matches "\\w+" "Words, words, words."))
> $24 = ("Words" "words" "words")
>
> I would like to see your version support the Python semantics [1]:
>
> > If capturing parentheses are used in pattern, then the text of
> > all groups in the pattern are also returned as part of the resulting
> > list.
> [...]
> > >>> re.split('\W+', 'Words, words, words.')
> > ['Words', 'words', 'words', '']
> > >>> re.split('(\W+)', 'Words, words, words.')
> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>
> >>> re.split('((,)?\W+?)', 'Words, words, words.')
> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>
>
> For the sake of consistency with the rest of the module perhaps
> support the `flags' option (just pass it to fold-matches) and use the
> same variable names, etc.:
>
> (define* (regexp-split regexp string #:optional (flags 0))
>  ...
>
> instead of:
>
> (define regexp-split
>  (lambda (regex str)
>  ...
>
>
> Also, to me the name seems unintuitive -- it is STR being split, not
> RE -- perhaps this can be folded in to the existing string-split
> function.
>
>
> A nice patch none-the-less!
>
>
> [1] http://docs.python.org/library/re.html#re.split
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
Well, I realized it's a mistake. We can use fold-matches anyway.

On Fri, Dec 30, 2011 at 4:46 PM, Nala Ginrut  wrote:

> hi Daniel! Very glad to see your reply.
> 1. I also think the order: (regexp str) is strange. But it's according to
> python version.
> And I think the 'string-match' also put regexp before str. Anyway, that's
> an easy mend.
> 2. I think it's a little different to implement a flag as python version.
> Since "ignorecase" flag must
> be passed to make-regexp. So we can't use fold-matches.
> Hmm...let me see what I can do...
>
> On Fri, Dec 30, 2011 at 1:34 PM, Daniel Hartwig  wrote:
>
>> Hello
>>
>> >>> On Thu, Dec 29, 2011 at 5:32 PM, Nala Ginrut 
>> >>> wrote:
>> 
>>  hi guilers!
>>  It seems like there's no "regexp-split" procedure in Guile.
>>  What we have is "string-split" which accepted Char only.
>>  So I wrote one for myself.
>> 
>>  --python code-
>>  >>> import re
>>  >>> re.split("([^0-9])", "123+456*/")
>>  [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>>  code end---
>> 
>>  The Guile version:
>> 
>>  --guile code---
>>  (regexp-split "([^0-9])"  "123+456*/")
>>  ==>("123" "+" "456" "*" "" "/" "")
>>  --code end
>> 
>>  Anyone interested in it?
>> 
>>
>> Nice work!  I have a couple of comments :-)
>>
>>
>> The matched pattern/deliminator is included in the output:
>>
>> scheme@(guile-user)> (regexp-split "(\\W+)" "Words, words, words.")
>> $21 = ("Words" ", " "words" ", " "words" "." "")
>> scheme@(guile-user)> (regexp-split "\\W+" "Words, words, words.")
>> $22 = ("Words" ", " "words" ", " "words" "." "")
>>
>> However, a user is not always interested in the deliminator.  Consider
>> the example given for string-split:
>>
>> scheme@(guile-user)> (string-split "root:x:0:0:root:/root:/bin/bash" #\:)
>> $23 = ("root" "x" "0" "0" "root" "/root" "/bin/bash")
>>
>> This behaviour can be obtained with list-matches on the complement of
>> REGEXP.
>>
>> scheme@(guile-user)> (map match:substring
>>  (list-matches "\\w+" "Words, words, words."))
>> $24 = ("Words" "words" "words")
>>
>> I would like to see your version support the Python semantics [1]:
>>
>> > If capturing parentheses are used in pattern, then the text of
>> > all groups in the pattern are also returned as part of the resulting
>> > list.
>> [...]
>> > >>> re.split('\W+', 'Words, words, words.')
>> > ['Words', 'words', 'words', '']
>> > >>> re.split('(\W+)', 'Words, words, words.')
>> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>>
>> >>> re.split('((,)?\W+?)', 'Words, words, words.')
>> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>>
>>
>> For the sake of consistency with the rest of the module perhaps
>> support the `flags' option (just pass it to fold-matches) and use the
>> same variable names, etc.:
>>
>> (define* (regexp-split regexp string #:optional (flags 0))
>>  ...
>>
>> instead of:
>>
>> (define regexp-split
>>  (lambda (regex str)
>>  ...
>>
>>
>> Also, to me the name seems unintuitive -- it is STR being split, not
>> RE -- perhaps this can be folded in to the existing string-split
>> function.
>>
>>
>> A nice patch none-the-less!
>>
>>
>> [1] http://docs.python.org/library/re.html#re.split
>>
>
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
On 30 December 2011 16:46, Nala Ginrut  wrote:
> hi Daniel! Very glad to see your reply.
> 1. I also think the order: (regexp str) is strange. But it's according to
> python version.
> And I think the 'string-match' also put regexp before str. Anyway, that's an
> easy mend.

`regexp string' is also the same order as `list-matches' and
`fold-matches'.  Probably best to keep it that way if this is in the
regex module.


>> I would like to see your version support the Python semantics [1]:
>>
>> > If capturing parentheses are used in pattern, then the text of
>> > all groups in the pattern are also returned as part of the resulting
>> > list.
>> [...]
>> > >>> re.split('\W+', 'Words, words, words.')
>> > ['Words', 'words', 'words', '']
>> > >>> re.split('(\W+)', 'Words, words, words.')
>> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>>
>> >>> re.split('((,)?\W+?)', 'Words, words, words.')
>> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']

FYI this can be achieved by changing the inner part to:

   (let* ...
          (s (substring string start end))
          (groups (map (lambda (n) (match:substring m n))
                       (iota (1- (match:count m)) 1
     (list `(,@ll ,s ,@groups) (match:end m) tail)))

Note: using srfi-1 iota



Re: [PATCH] add regexp-split

2011-12-30 Thread Marijn
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 29-12-11 10:32, Nala Ginrut wrote:
> hi guilers! It seems like there's no "regexp-split" procedure in
> Guile. What we have is "string-split" which accepted Char only. So
> I wrote one for myself.
> 
> --python code-
 import re re.split("([^0-9])", "123+456*/")
> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] code end---
> 
> The Guile version:
> 
> --guile code--- (regexp-split "([^0-9])"  "123+456*/") 
> ==>("123" "+" "456" "*" "" "/" "") --code end
> 
> Anyone interested in it?

Hi there,

I think we're all happy that Guile is getting this support, however I
couldn't help but notice that the above results look a bit funny and
indeed are incompatible with racket's implementation:

> (regexp-split "([^0-9])" "123+456*/")
'("123" "456" "" "")

Apparently because their version doesn't support capturing groups in
this function. I've raised the issue with them as well, but there are
some doubts that it is useful/sane to support this. Perhaps other
schemes' regexp libraries should be compared as well. Their tests
would certainly be useful and may point out other incompatibilities
that no-one is aware of (as well as improve your code(!)).

Marijn
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79jwgACgkQp/VmCx0OL2zCrgCgrCtBGvKaejnfceWj8RaBz+lm
lfMAoIrR0qr8IFKhFG4KGBevf1LQfoZv
=2x7Y
-END PGP SIGNATURE-



bug#10410: guile: uri module confused by domain names starting with numbers, ipv6 addresses

2011-12-30 Thread Daniel Hartwig
Package: guile
Version: 2.0.3
Tags: patch
X-Debbugs-CC: guile-devel@gnu.org


Hello

I have noticed that the (web uri) module does not handle domain names
that start with numbers:

scheme@(guile-user)> (string->uri "http://123.com";)
$1 = #f
scheme@(guile-user)> (build-uri 'http #:host "123.com")
web/uri.scm:85:6: In procedure build-uri:
web/uri.scm:85:6: Throw to key `uri-error' with args `("Expected valid
host: ~s" ("123.com"))'.


Also, `string->uri' does not handle ipv6 addresses:

scheme@(guile-user)> (string->uri "http://[2001:db8::1]";)
$2 = #f


Attached patch implements support for domain names that start with
numbers by correcting the
regular expressions used by `valid-host?' as well as some related tests.

`string->uri' requires similar changes to support the ipv6 address
literals.  I'm yet to found a very elegant way to do this though it is
easy enough to simply butcher `authority-pat'.
From 9fced395b4afb4e022414a4b451a50b31ceacedd Mon Sep 17 00:00:00 2001
From: Daniel Hartwig 
Date: Fri, 30 Dec 2011 17:49:37 +0800
Subject: [PATCH] support URIs with domain names starting with numbers

* module/web/uri.scm (valid-host?): Fix regexp to support
domain names starting with numbers.
* test-suite/tests/web-uri.scm: Add tests for above and
IP literals.
---
 module/web/uri.scm|4 +-
 test-suite/tests/web-uri.test |   49 -
 2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/module/web/uri.scm b/module/web/uri.scm
index 67ecbae..ff13847 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -89,9 +89,9 @@ consistency checks to make sure that the constructed URI is valid."
 ;; 3490), and non-ASCII host names.
 ;;
 (define ipv4-regexp
-  (make-regexp "^([0-9.]+)"))
+  (make-regexp "^([0-9.]+)$"))
 (define ipv6-regexp
-  (make-regexp "^\\[([0-9a-fA-F:]+)\\]+"))
+  (make-regexp "^\\[([0-9a-fA-F:]+)\\]$"))
 (define domain-label-regexp
   (make-regexp "^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$"))
 (define top-label-regexp
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 9118eea..4f859e0 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -90,6 +90,18 @@
 (uri=? (build-uri 'http #:host "bad.host.1" #:validate? #f)
#:scheme 'http #:host "bad.host.1" #:path ""))
 
+  (pass-if "http://1.good.host";
+(uri=? (build-uri 'http #:host "1.good.host")
+   #:scheme 'http #:host "1.good.host" #:path ""))
+
+  (pass-if "http://192.0.2.1";
+(uri=? (build-uri 'http #:host "192.0.2.1")
+   #:scheme 'http #:host "192.0.2.1" #:path ""))
+
+  (pass-if "http://[2001:db8::1]";
+(uri=? (build-uri 'http #:host "[2001:db8::1]")
+   #:scheme 'http #:host "[2001:db8::1]" #:path ""))
+
   (pass-if-uri-exception "http://foo:not-a-port";
  "Expected.*port"
  (build-uri 'http #:host "foo" #:port "not-a-port"))
@@ -135,6 +147,25 @@
   (pass-if "http://bad.host.1";
 (not (string->uri "http://bad.host.1";)))
 
+  (pass-if "http://1.good.host";
+(uri=? (string->uri "http://1.good.host";)
+   #:scheme 'http #:host "1.good.host" #:path ""))
+
+  (pass-if "http://192.0.2.1";
+(uri=? (string->uri "http://192.0.2.1";)
+   #:scheme 'http #:host "192.0.2.1" #:path ""))
+
+  (pass-if "http://[2001:db8::1]";
+(uri=? (string->uri "http://[2001:db8::1]";)
+   #:scheme 'http #:host "[2001:db8::1]" #:path ""))
+
+  (pass-if "http://[2001:db8::1]:80";
+(uri=? (string->uri "http://[2001:db8::1]";)
+   #:scheme 'http
+   #:host "[2001:db8::1]"
+   #:port 80
+   #:path ""))
+
   (pass-if "http://foo:";
 (uri=? (string->uri "http://foo:";)
#:scheme 'http #:host "foo" #:path ""))
@@ -184,6 +215,18 @@
 (equal? "ftp://foo@bar:22/baz";
 (uri->string (string->uri "ftp://foo@bar:22/baz";
   
+  (pass-if "http://192.0.2.1";
+(equal? "http://192.0.2.1";
+(uri->string (string->uri "http://192.0.2.1";
+
+  (pass-if "http://[2001:db8::1]";
+(equal? "http://[2001:db8::1]";
+(uri->string (string->uri "http://[2001:db8::1]";
+
+  (pass-if "http://[2001:db8::1]:80";
+(equal? "http://[2001:db8::1]:80";
+   (uri->string (string->uri "http://[2001:db8::1]:80";
+
   (pass-if "http://foo:";
 (equal? "http://foo";
 (uri->string (string->uri "http://foo:";
@@ -193,7 +236,11 @@
 (uri->string (string->uri "http://foo:/";)
 
 (with-test-prefix "decode"
-  (pass-if (equal? "foo bar" (uri-decode "foo%20bar"
+  (pass-if "foo%20bar"
+(equal? "foo bar" (uri-decode "foo%20bar")))
+
+  (pass-if "foo+bar"
+(equal? "foo bar" (uri-decode "foo+bar"
 
 (with-test-prefix "encode"
   (pass-if (equal? "foo%20bar" (uri-encode "foo bar"
-- 
1.7.5.4



Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
Hmm, interesting!
I must confess I'm not familiar with Racket, but I think the aim of Guile
contains practicality.
So I think regex-lib of Guile does this at least.
Anyway, I believe an implementation should do its best to provide any
useful mechanism for the user. Or it won't be popular anymore.
When I talk about "useful", I mean "it brings the user convenient", not
"the developer think it's useful".
Just my mumble, no any offense.
Thank you for telling us this issue. ;-)

On Fri, Dec 30, 2011 at 6:14 PM, Marijn  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 29-12-11 10:32, Nala Ginrut wrote:
> > hi guilers! It seems like there's no "regexp-split" procedure in
> > Guile. What we have is "string-split" which accepted Char only. So
> > I wrote one for myself.
> >
> > --python code-
>  import re re.split("([^0-9])", "123+456*/")
> > [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] code end---
> >
> > The Guile version:
> >
> > --guile code--- (regexp-split "([^0-9])"  "123+456*/")
> > ==>("123" "+" "456" "*" "" "/" "") --code end
> >
> > Anyone interested in it?
>
> Hi there,
>
> I think we're all happy that Guile is getting this support, however I
> couldn't help but notice that the above results look a bit funny and
> indeed are incompatible with racket's implementation:
>
> > (regexp-split "([^0-9])" "123+456*/")
> '("123" "456" "" "")
>
> Apparently because their version doesn't support capturing groups in
> this function. I've raised the issue with them as well, but there are
> some doubts that it is useful/sane to support this. Perhaps other
> schemes' regexp libraries should be compared as well. Their tests
> would certainly be useful and may point out other incompatibilities
> that no-one is aware of (as well as improve your code(!)).
>
> Marijn
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.18 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk79jwgACgkQp/VmCx0OL2zCrgCgrCtBGvKaejnfceWj8RaBz+lm
> lfMAoIrR0qr8IFKhFG4KGBevf1LQfoZv
> =2x7Y
> -END PGP SIGNATURE-
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
Great! It's better now.
Here's the brand new patch~

On Fri, Dec 30, 2011 at 5:42 PM, Daniel Hartwig  wrote:

> On 30 December 2011 16:46, Nala Ginrut  wrote:
> > hi Daniel! Very glad to see your reply.
> > 1. I also think the order: (regexp str) is strange. But it's according to
> > python version.
> > And I think the 'string-match' also put regexp before str. Anyway,
> that's an
> > easy mend.
>
> `regexp string' is also the same order as `list-matches' and
> `fold-matches'.  Probably best to keep it that way if this is in the
> regex module.
>
>
> >> I would like to see your version support the Python semantics [1]:
> >>
> >> > If capturing parentheses are used in pattern, then the text of
> >> > all groups in the pattern are also returned as part of the resulting
> >> > list.
> >> [...]
> >> > >>> re.split('\W+', 'Words, words, words.')
> >> > ['Words', 'words', 'words', '']
> >> > >>> re.split('(\W+)', 'Words, words, words.')
> >> > ['Words', ', ', 'words', ', ', 'words', '.', '']
> >>
> >> >>> re.split('((,)?\W+?)', 'Words, words, words.')
> >> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>
> FYI this can be achieved by changing the inner part to:
>
>(let* ...
>   (s (substring string start end))
>   (groups (map (lambda (n) (match:substring m n))
>(iota (1- (match:count m)) 1
>  (list `(,@ll ,s ,@groups) (match:end m) tail)))
>
> Note: using srfi-1 iota
>
>
From b738a8b890f41bf684c0556ca79af2d7c14b6df5 Mon Sep 17 00:00:00 2001
From: NalaGinrut 
Date: Fri, 30 Dec 2011 19:38:38 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   18 +-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..b5f6149 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -41,7 +41,7 @@
   #:export (match:count match:string match:prefix match:suffix
regexp-match? regexp-quote match:start match:end match:substring
string-match regexp-substitute fold-matches list-matches
-   regexp-substitute/global))
+   regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +226,19 @@
 (begin
   (do-item (car items)) ; This is not.
   (next-item (cdr items)))
+  
+(define* (regexp-split regex str #:optional (flags 0))
+  (let ((ret (fold-matches 
+	  regex str (list '() 0 '(""))
+	  (lambda (m prev)
+		(let* ((ll (car prev))
+		   (start (cadr prev))
+		   (tail (match:suffix m))
+		   (end (match:start m))
+		   (s (substring/shared str start end))
+		   (groups (map (lambda (n) (match:substring m n))
+(iota (1- (match:count m))
+		  (list `(,@ll ,s ,@groups) (match:end m) tail)))
+	  flags)))
+`(,@(car ret) ,(caddr ret
+
-- 
1.7.0.4



Re: [PATCH] add regexp-split

2011-12-30 Thread Marijn
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30-12-11 11:56, Nala Ginrut wrote:
> Hmm, interesting! I must confess I'm not familiar with Racket, but
> I think the aim of Guile contains practicality.

Not sure what you're trying to imply here or to which of my points
you're responding.

> So I think regex-lib of Guile does this at least. Anyway, I believe
> an implementation should do its best to provide any useful
> mechanism for the user. Or it won't be popular anymore. When I talk
> about "useful", I mean "it brings the user convenient", not "the
> developer think it's useful".

Idem ditto.

> Just my mumble, no any offense. Thank you for telling us this
> issue. ;-)

Marijn
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79pQ8ACgkQp/VmCx0OL2wMewCgml8guLqLK2fx7NWHa1JQ7pQ9
wrIAoLkHJnNF8nkWWMM4EKkyBEyffZEQ
=1IRy
-END PGP SIGNATURE-



Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
Forget to load (srfi-1 iota) ,again...

On Fri, Dec 30, 2011 at 7:40 PM, Nala Ginrut  wrote:

> Great! It's better now.
> Here's the brand new patch~
>
>
> On Fri, Dec 30, 2011 at 5:42 PM, Daniel Hartwig  wrote:
>
>> On 30 December 2011 16:46, Nala Ginrut  wrote:
>> > hi Daniel! Very glad to see your reply.
>> > 1. I also think the order: (regexp str) is strange. But it's according
>> to
>> > python version.
>> > And I think the 'string-match' also put regexp before str. Anyway,
>> that's an
>> > easy mend.
>>
>> `regexp string' is also the same order as `list-matches' and
>> `fold-matches'.  Probably best to keep it that way if this is in the
>> regex module.
>>
>>
>> >> I would like to see your version support the Python semantics [1]:
>> >>
>> >> > If capturing parentheses are used in pattern, then the text of
>> >> > all groups in the pattern are also returned as part of the resulting
>> >> > list.
>> >> [...]
>> >> > >>> re.split('\W+', 'Words, words, words.')
>> >> > ['Words', 'words', 'words', '']
>> >> > >>> re.split('(\W+)', 'Words, words, words.')
>> >> > ['Words', ', ', 'words', ', ', 'words', '.', '']
>> >>
>> >> >>> re.split('((,)?\W+?)', 'Words, words, words.')
>> >> ['Words', ', ', ',', 'words', ', ', ',', 'words', '.', None, '']
>>
>> FYI this can be achieved by changing the inner part to:
>>
>>(let* ...
>>   (s (substring string start end))
>>   (groups (map (lambda (n) (match:substring m n))
>>(iota (1- (match:count m)) 1
>>  (list `(,@ll ,s ,@groups) (match:end m) tail)))
>>
>> Note: using srfi-1 iota
>>
>>
>
From 27aa85d56766d152eced21cd0d2915c70a99dcc7 Mon Sep 17 00:00:00 2001
From: NalaGinrut 
Date: Fri, 30 Dec 2011 19:46:01 +0800
Subject: [PATCH] ADD regexp-split

---
 module/ice-9/regex.scm |   19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm
index f7b94b7..e9b01ea 100644
--- a/module/ice-9/regex.scm
+++ b/module/ice-9/regex.scm
@@ -38,10 +38,11 @@
  POSIX regex support functions.
 
 (define-module (ice-9 regex)
+  #:autoload (srfi srfi-1) (iota)
   #:export (match:count match:string match:prefix match:suffix
regexp-match? regexp-quote match:start match:end match:substring
string-match regexp-substitute fold-matches list-matches
-   regexp-substitute/global))
+   regexp-substitute/global regexp-split))
 
 ;; References:
 ;;
@@ -226,3 +227,19 @@
 (begin
   (do-item (car items)) ; This is not.
   (next-item (cdr items)))
+  
+(define* (regexp-split regex str #:optional (flags 0))
+  (let ((ret (fold-matches 
+	  regex str (list '() 0 '(""))
+	  (lambda (m prev)
+		(let* ((ll (car prev))
+		   (start (cadr prev))
+		   (tail (match:suffix m))
+		   (end (match:start m))
+		   (s (substring/shared str start end))
+		   (groups (map (lambda (n) (match:substring m n))
+(iota (1- (match:count m))
+		  (list `(,@ll ,s ,@groups) (match:end m) tail)))
+	  flags)))
+`(,@(car ret) ,(caddr ret
+
-- 
1.7.0.4



Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
I just expressed "I think group capturing is useful and someone didn't
think that's true".
If this is not what your last mail mean, I think it's better to ignore it.

On Fri, Dec 30, 2011 at 7:48 PM, Marijn  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 30-12-11 11:56, Nala Ginrut wrote:
> > Hmm, interesting! I must confess I'm not familiar with Racket, but
> > I think the aim of Guile contains practicality.
>
> Not sure what you're trying to imply here or to which of my points
> you're responding.
>
> > So I think regex-lib of Guile does this at least. Anyway, I believe
> > an implementation should do its best to provide any useful
> > mechanism for the user. Or it won't be popular anymore. When I talk
> > about "useful", I mean "it brings the user convenient", not "the
> > developer think it's useful".
>
> Idem ditto.
>
> > Just my mumble, no any offense. Thank you for telling us this
> > issue. ;-)
>
> Marijn
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.18 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk79pQ8ACgkQp/VmCx0OL2wMewCgml8guLqLK2fx7NWHa1JQ7pQ9
> wrIAoLkHJnNF8nkWWMM4EKkyBEyffZEQ
> =1IRy
> -END PGP SIGNATURE-
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Neil Jerram
Nala Ginrut  writes:

> hi guilers!
> It seems like there's no "regexp-split" procedure in Guile.
> What we have is "string-split" which accepted Char only.
> So I wrote one for myself.

We've had this topic before, and it only needs a search for
"regex-split guile" to find it:
http://old.nabble.com/regex-split-for-Guile-td31093245.html.

   Neil



Re: [PATCH] add regexp-split

2011-12-30 Thread Marijn
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30-12-11 12:52, Nala Ginrut wrote:
> I just expressed "I think group capturing is useful and someone 
> didn't think that's true". If this is not what your last mail
> mean, I think it's better to ignore it.

Group capturing is useful, but the question is whether it is useful in
the context of regexp-split. Maybe it is, maybe it isn't. Racket seems
to be doing it differently than python, so I think that constitutes
reason to look more closely. Certainly guile should follow racket over
python, everything else being equal, but usually everything isn't
equal if only one has a look and I'm saying that we should look at
least at other schemes for inspiration.
If you're so convinced that python is doing it right here and should
be followed, then perhaps you can give some examples of how capturing
groups are useful in a function that is supposed to split strings at
regexps.

Another data point:

[14:17]  what does chicken return for (irregex-split "([^0-9])"
 "123+456*/")  ?
[14:18]  ("123" "456")

Looks like chicken doesn't do capturing groups in their version, but
they don't have the empty matches either. How about that...

Surely by now you can see that it's worth discussing over the
semantics of regexp-split.

Marijn



-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk79u1YACgkQp/VmCx0OL2xpYACgpYuguKw4ju0GsX3ApqrZtjXF
ppsAn2wv0B8sNiSgtULA1TIFjiXh2Pdn
=C8E4
-END PGP SIGNATURE-



Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
On 30 December 2011 21:23, Marijn  wrote:
> Group capturing is useful, but the question is whether it is useful in
> the context of regexp-split. Maybe it is, maybe it isn't. Racket seems
> to be doing it differently than python, so I think that constitutes
> reason to look more closely. Certainly guile should follow racket over
> python, everything else being equal, but usually everything isn't
> equal if only one has a look and I'm saying that we should look at
> least at other schemes for inspiration.
> If you're so convinced that python is doing it right here and should
> be followed, then perhaps you can give some examples of how capturing
> groups are useful in a function that is supposed to split strings at
> regexps.

Having the *option* to return the captured groups in `regexp-split' is
certainly useful -- consider implementing a parser [1].  If the
captured groups are not desired, then simply omit the grouping parens
from the expression.

[1] http://80.68.89.23/2003/Oct/26/reSplit/

>
> Another data point:
>
> [14:17]  what does chicken return for (irregex-split "([^0-9])"
>  "123+456*/")  ?
> [14:18]  ("123" "456")
>
> Looks like chicken doesn't do capturing groups in their version, but
> they don't have the empty matches either. How about that...

For tokenizing I think you want to keep any empty strings, otherwise
you lose track of which `field' you are in (consider /etc/passwd
entries).  This also matches the existing behaviour of `string-split'.



Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
Well, I see.
So the previous discussion didn't make this proc put into Guile?
Now that so many people interested in this topic.

On Fri, Dec 30, 2011 at 9:03 PM, Neil Jerram wrote:

> Nala Ginrut  writes:
>
> > hi guilers!
> > It seems like there's no "regexp-split" procedure in Guile.
> > What we have is "string-split" which accepted Char only.
> > So I wrote one for myself.
>
> We've had this topic before, and it only needs a search for
> "regex-split guile" to find it:
> http://old.nabble.com/regex-split-for-Guile-td31093245.html.
>
>   Neil
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
On 30 December 2011 19:47, Nala Ginrut  wrote:
> Forget to load (srfi-1 iota) ,again...
>
>
> On Fri, Dec 30, 2011 at 7:40 PM, Nala Ginrut  wrote:
>>
>> Great! It's better now.
>> Here's the brand new patch~

I notice that this does not handle the case where there are no matches:

scheme@(guile-user)> (regexp-split "[^0-9]" "123")
$26 = ((""))


  (let ((ret (fold-matches
  regex str (list '() 0 '(""))

becomes:

  (let ((ret (fold-matches
  regex str (list '() 0 str)

and the result:

scheme@(guile-user)> (regexp-split "[^0-9]" "123")
$28 = ("123")
scheme@(guile-user)> (string-split "123" #\!)
$29 = ("123")


I also note that you are using `substring/shared' when I think you are
after `substring'.  Both of these are efficient and use shared memory
when they can, but there is a difference.



Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
On 30 December 2011 21:03, Neil Jerram  wrote:
> Nala Ginrut  writes:
>
>> hi guilers!
>> It seems like there's no "regexp-split" procedure in Guile.
>> What we have is "string-split" which accepted Char only.
>> So I wrote one for myself.
>
> We've had this topic before, and it only needs a search for
> "regex-split guile" to find it:
> http://old.nabble.com/regex-split-for-Guile-td31093245.html.
>

Good to see that there is continuing interest in this feature.

IMO, the implementation here is more elegant and readable for it's use
of `fold-matches'.  The first implementation from the thread you
mention effectively rolls it's own version of `fold-matches' over the
result of `list-matches' (which is implemented using `fold-matches'
!).



Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
Now that we have previous thread on this topic, I think it's no need to
format a patch.

Maybe this will solve the problem:
(define* (regexp-split regex str #:optional (flags 0))
  (let ((ret (fold-matches
  regex str (list '() 0 str)
  (lambda (m prev)
(let* ((ll (car prev))
   (start (cadr prev))
   (tail (match:suffix m))
   (end (match:start m))
   (s (substring/shared str start end))
   (groups (map (lambda (n) (match:substring m n))
(iota (1- (match:count m))
  (list `(,@ll ,s ,@groups) (match:end m) tail)))
  flags)))
`(,@(car ret) ,(caddr ret


On Fri, Dec 30, 2011 at 11:33 PM, Daniel Hartwig  wrote:

> On 30 December 2011 21:03, Neil Jerram  wrote:
> > Nala Ginrut  writes:
> >
> >> hi guilers!
> >> It seems like there's no "regexp-split" procedure in Guile.
> >> What we have is "string-split" which accepted Char only.
> >> So I wrote one for myself.
> >
> > We've had this topic before, and it only needs a search for
> > "regex-split guile" to find it:
> > http://old.nabble.com/regex-split-for-Guile-td31093245.html.
> >
>
> Good to see that there is continuing interest in this feature.
>
> IMO, the implementation here is more elegant and readable for it's use
> of `fold-matches'.  The first implementation from the thread you
> mention effectively rolls it's own version of `fold-matches' over the
> result of `list-matches' (which is implemented using `fold-matches'
> !).
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Neil Jerram
Nala Ginrut  writes:

> Well, I see.
> So the previous discussion didn't make this proc put into Guile?
> Now that so many people interested in this topic. 

I'm afraid I can't recall what happened following that thread.

What feels important to me, though, is the elegance of the overall API.
There are already _some_ regex-related APIs in the core Guile library
(ice-9 regex) and I would guess that there are many many possible
variations of these and other string + regex processing APIs that one
might propose.  Also we've now demonstrated that regex-split can be
implemented, on top of the existing library, with only a few lines of
code.  Therefore I'd say (speaking only as an observer) that you need to
make a case for how your regex-split beautifully complements what's
already there in (ice-9 regex), or alternatively for replacing (ice-9
regex) with a more beautiful set of operations including regex-split.

Alternatively^2, you could package regex-split outside the core library,
as a test case for the guild hall.  Then it doesn't need to be justified
in relation to (ice-9 regex), it can just be a convenient module that
provides a more Python-like API.

Regards,
 Neil



Re: bug#10410: guile: uri module confused by domain names starting with numbers, ipv6 addresses

2011-12-30 Thread Daniel Hartwig
On 30 December 2011 18:14, Daniel Hartwig  wrote:
>
> `string->uri' requires similar changes to support the ipv6 address
> literals.  I'm yet to found a very elegant way to do this though it is
> easy enough to simply butcher `authority-pat'.

So the issue was really with `parse-authority'.

The attached patch cleans this up with support for IPv6 (including
dotted-quad notation), fixes some typos in the tests, and adds new
tests.

With both patches applied the web-uri.test now passes for all tests
and I can finally do:

scheme@(guile-user)> (string->uri "http://[:::192.0.2.1]/foo";)
$2 = #< scheme: http userinfo: #f host: "[:::192.0.2.1]"
port: #f path: "/foo" query: #f fragment: #f>
scheme@(guile-user)> (string->uri "http://123.com";)
$3 = #< scheme: http userinfo: #f host: "123.com" port: #f path:
"" query: #f fragment: #f>
From b839aa909c61ef2ee68ea652e6e0095afc3f2f24 Mon Sep 17 00:00:00 2001
From: Daniel Hartwig 
Date: Sat, 31 Dec 2011 00:16:42 +0800
Subject: [PATCH 2/2] enhance IPv6 support

* module/web/uri.scm (valid-host?): Support dotted-quad notation
  in IPv6 addresses.
  (parse-authority): Support IPv6 literals.
* test-suite/tests/web-uri.test: Add and fix tests.
---
 module/web/uri.scm|4 ++--
 test-suite/tests/web-uri.test |   16 
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/module/web/uri.scm b/module/web/uri.scm
index ff13847..b8a6951 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -91,7 +91,7 @@ consistency checks to make sure that the constructed URI is valid."
 (define ipv4-regexp
   (make-regexp "^([0-9.]+)$"))
 (define ipv6-regexp
-  (make-regexp "^\\[([0-9a-fA-F:]+)\\]$"))
+  (make-regexp "^\\[([0-9a-fA-F:.]+)\\]$"))
 (define domain-label-regexp
   (make-regexp "^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$"))
 (define top-label-regexp
@@ -116,7 +116,7 @@ consistency checks to make sure that the constructed URI is valid."
 (define userinfo-pat
   "[a-zA-Z0-9_.!~*'();:&=+$,-]+")
 (define host-pat
-  "[a-zA-Z0-9.-]+")
+  "[a-zA-Z0-9.-]+|\\[[0-9a-FA-F:.]+\\]")
 (define port-pat
   "[0-9]*")
 (define authority-regexp
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 4f859e0..cd6a944 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -102,6 +102,10 @@
 (uri=? (build-uri 'http #:host "[2001:db8::1]")
#:scheme 'http #:host "[2001:db8::1]" #:path ""))
 
+  (pass-if "http://[:::192.0.2.1]";
+(uri=? (build-uri 'http #:host "[:::192.0.2.1]")
+   #:scheme 'http #:host "[:::192.0.2.1]" #:path ""))
+
   (pass-if-uri-exception "http://foo:not-a-port";
  "Expected.*port"
  (build-uri 'http #:host "foo" #:port "not-a-port"))
@@ -160,12 +164,16 @@
#:scheme 'http #:host "[2001:db8::1]" #:path ""))
 
   (pass-if "http://[2001:db8::1]:80";
-(uri=? (string->uri "http://[2001:db8::1]";)
+(uri=? (string->uri "http://[2001:db8::1]:80";)
#:scheme 'http
#:host "[2001:db8::1]"
#:port 80
#:path ""))
 
+  (pass-if "http://[:::192.0.2.1]";
+(uri=? (string->uri "http://[:::192.0.2.1]";)
+   #:scheme 'http #:host "[:::192.0.2.1]" #:path ""))
+
   (pass-if "http://foo:";
 (uri=? (string->uri "http://foo:";)
#:scheme 'http #:host "foo" #:path ""))
@@ -223,9 +231,9 @@
 (equal? "http://[2001:db8::1]";
 (uri->string (string->uri "http://[2001:db8::1]";
 
-  (pass-if "http://[2001:db8::1]:80";
-(equal? "http://[2001:db8::1]:80";
-   (uri->string (string->uri "http://[2001:db8::1]:80";
+  (pass-if "http://[:::192.0.2.1]";
+(equal? "http://[:::192.0.2.1]";
+(uri->string (string->uri "http://[:::192.0.2.1]";
 
   (pass-if "http://foo:";
 (equal? "http://foo";
-- 
1.7.5.4



Re: [PATCH] add regexp-split

2011-12-30 Thread Nala Ginrut
OK, I'll put this proc in my own lib since there's no regexp-lib but a
regexp-core in Guile. Anyway, it's almost completed now. One may copy the
final version if needed.

On Sat, Dec 31, 2011 at 12:26 AM, Neil Jerram wrote:

> Nala Ginrut  writes:
>
> > Well, I see.
> > So the previous discussion didn't make this proc put into Guile?
> > Now that so many people interested in this topic.
>
> I'm afraid I can't recall what happened following that thread.
>
> What feels important to me, though, is the elegance of the overall API.
> There are already _some_ regex-related APIs in the core Guile library
> (ice-9 regex) and I would guess that there are many many possible
> variations of these and other string + regex processing APIs that one
> might propose.  Also we've now demonstrated that regex-split can be
> implemented, on top of the existing library, with only a few lines of
> code.  Therefore I'd say (speaking only as an observer) that you need to
> make a case for how your regex-split beautifully complements what's
> already there in (ice-9 regex), or alternatively for replacing (ice-9
> regex) with a more beautiful set of operations including regex-split.
>
> Alternatively^2, you could package regex-split outside the core library,
> as a test case for the guild hall.  Then it doesn't need to be justified
> in relation to (ice-9 regex), it can just be a convenient module that
> provides a more Python-like API.
>
> Regards,
> Neil
>


Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
Hello

On 31 December 2011 04:11, Eli Barzilay  wrote:
> [I don't think that I'm subscribed to the Guile list, but feel free to
> forward it there.]

Copying back the list and Marijn.

> 5 hours ago, Marijn wrote:
>> On 30-12-11 12:52, Nala Ginrut wrote:
>> > I just expressed "I think group capturing is useful and someone
>> > didn't think that's true". If this is not what your last mail
>> > mean, I think it's better to ignore it.
>>
>> Group capturing is useful, but the question is whether it is useful
>> in the context of regexp-split.
>
> Yes, that's exactly the point.  What I'm worried about is someone
> defining a regexp for several uses, for example:
>
>  (define rx "foo([0-9]*)")
>
> with the intention of using it for both splitting and other
> extraction.  (This is a bad example but it's a common case for
> regexps.)  The problem is that if you really want to just *split* with
> this pattern, you're stuck in bad-code-land...  Two possible
> solutions:
>
>  Do the split, then filter out the even-numbered items from the
>  result.
>
> This is bad not only because it's inefficiently allocaing substrings
> that will get discarded (investing work redundantly which will get
> trashed by more work) -- it's also bad because such code is sensitive
> to the number of groups.  Eg, if the pattern is changed to have two
> groups, then you need to modify the filtering now.  Another solution:
>
>  Tweak the regexp and turn all groups into non-capturing groups.
>
> This is something that I've run into several times, and IME it is a
> very bad solution.  Usually, you end up doing some half-assed job of
> this tweaking: you don't bother to cache the compiled expressions for
> speed, and you tend to introduce assumptions by mistake -- like
> assuming that all "("s that are not precedded by a backslash or
> followed by a "?:" are groups -- and fail miserably when the input is
> something like "...(..." or "...[0-9()]...".  Actually, that leads
> into yet another solution:
>
>  Explicitly say that your functions expect patterns without groups.
>
> That fails since it propagates the problem up for users of your code
> (they might need to maintain two versions of regexps too).  And since
> many of them are likely to skim the docs and just do whatever works
> for them, they can easily write code that can fail satisfying these
> assumptions -- and the fun part is that this happens, the result of
> such bugs is utterly confusing...
>
> Four hours ago, Daniel Hartwig wrote:
>> Having the *option* to return the captured groups in `regexp-split' is
>> certainly useful -- consider implementing a parser [1].  If the
>> captured groups are not desired, then simply omit the grouping parens
>> from the expression.
>
> Hopefully the above explains why I think that that "simply omit" can
> turn out to be a disaster...
>
> In any case, that's my reason for disliking that added functionality
> even if it "can be more useful".  Lucky for me, in Racket we also have
> the existing behavior with code that will break if we change it, so I
> don't need to argue my point much...
>
> And BTW, all of that is *not* to say that this functionality is
> useless -- just arguing for it to be provided under a different name.
>


How about having an optional argument to control the behaviour?  The
default could be to not include the groups, thus mimicking the output
of Guile's `string-split' and `regexp-split' in other Schemes.

If two procedures are implemented they will be almost verbatim copies
of each other.  The changes required in the body would be minimal:

 (groups (if incl-groups?
 (map (lambda (n) (match:substring m n))
  (iota (1- (match:count m
 '(

>
>> [...] If you're so convinced that python is doing it right here and
>> should be followed, then perhaps you can give some examples of how
>> capturing groups are useful in a function that is supposed to split
>> strings at regexps.
>
> I don't think that such examples will help.  It's obvious how it can
> be useful to have this feature -- the main issue is the kind of bugs
> that it will lead to.  (And in the above I tried to give some examples
> of how that's bad.)
>
>
>> Another data point:
>>
>> [14:17]  what does chicken return for (irregex-split "([^0-9])"
>>  "123+456*/")  ?
>> [14:18]  ("123" "456")
>>
>> Looks like chicken doesn't do capturing groups in their version, but
>> they don't have the empty matches either. How about that...
>
> Yeah, we've considered these things for a while.  There is
> inconsistency between different languages and regexp libraries on how
> to deal with empty strings -- some drop them at the edges, some drop
> all of them, and IIRC, some even drop all empty empty strings.  Oh,
> and things get infinitely more amusing when you consider look-ahead
> and look-back patterns (including \b patterns)...
>
> You can see our tests here:
>
>  
> https://github.com/plt/racket/blob/master/collects/tests

Re: [PATCH] add regexp-split

2011-12-30 Thread Eli Barzilay
40 minutes ago, Daniel Hartwig wrote:
> 
> How about having an optional argument to control the behaviour?  The
> default could be to not include the groups, thus mimicking the
> output of Guile's `string-split' and `regexp-split' in other
> Schemes.

That can work, though I personally prefer a separate name.  (But
obviously, my personal taste has zero weight for guile...)


> If two procedures are implemented they will be almost verbatim copies
> of each other.

Yeah, but that's not an argument in favor or against -- since you can
switch between:

  (define (foo x [other-behavior? #f]) ...code..)

and

  (define (foo-internal x other-behavior?) ...same code...)
  (define (foo x) (foo-internal x #f))
  (define (foo-other x) (foo-internal x #t))

where the internal function is not exported from the library.


> No comment on Perl's handling.
> 
> I think Racket does the right thing by keeping *all* the empty
> strings in place.

Well, I do think that Perl (as well as other libraries & languages)
are a good reference point to compare against...  If anything, you
should at least be aware of other design choices and why you went in a
different direction.  (And we did not follow perl in all aspects, as
those tests clarify.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!



Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
On 31 December 2011 10:32, Eli Barzilay  wrote:
> 40 minutes ago, Daniel Hartwig wrote:
>> If two procedures are implemented they will be almost verbatim copies
>> of each other.
>
> Yeah, but that's not an argument in favor or against -- since you can
> switch between:
>
>  (define (foo x [other-behavior? #f]) ...code..)
>
> and
>
>  (define (foo-internal x other-behavior?) ...same code...)
>  (define (foo x) (foo-internal x #f))
>  (define (foo-other x) (foo-internal x #t))
>
> where the internal function is not exported from the library.

Ah, I did not think of that :-)

>
>
>> No comment on Perl's handling.
>>
>> I think Racket does the right thing by keeping *all* the empty
>> strings in place.
>
> Well, I do think that Perl (as well as other libraries & languages)
> are a good reference point to compare against...  If anything, you
> should at least be aware of other design choices and why you went in a
> different direction.  (And we did not follow perl in all aspects, as
> those tests clarify.)
>

A good point.  I'm interested to find out the reasoning behind Perl's
decision to drop empty strings..  Seems a strange thing to do IMO.



Re: [PATCH] add regexp-split

2011-12-30 Thread Eli Barzilay
Just now, Daniel Hartwig wrote:
> On 31 December 2011 10:32, Eli Barzilay  wrote:
> > 40 minutes ago, Daniel Hartwig wrote:
> >>
> >> I think Racket does the right thing by keeping *all* the empty
> >> strings in place.
> >
> > Well, I do think that Perl (as well as other libraries &
> > languages) are a good reference point to compare against...  If
> > anything, you should at least be aware of other design choices and
> > why you went in a different direction.  (And we did not follow
> > perl in all aspects, as those tests clarify.)
> 
> A good point.  I'm interested to find out the reasoning behind
> Perl's decision to drop empty strings..  Seems a strange thing to do
> IMO.

I think that there's a general tendency to make things "nice" and
dropping these things for cases where what the user wants is
"obvious".  And then when you realize that making the function behave
differently sometimes is a bad idea, but you can't back off from the
earlier version without breaking a ton of code.  In any case, look
also at the Emacs solution of an optional argument to drop all empty
strings, with a weird behavior when no regexp is given...

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!



Re: [PATCH] add regexp-split

2011-12-30 Thread Daniel Hartwig
On 31 December 2011 11:21, Eli Barzilay  wrote:
>> A good point.  I'm interested to find out the reasoning behind
>> Perl's decision to drop empty strings..  Seems a strange thing to do
>> IMO.
>
> I think that there's a general tendency to make things "nice" and
> dropping these things for cases where what the user wants is
> "obvious".  And then when you realize that making the function behave
> differently sometimes is a bad idea, but you can't back off from the
> earlier version without breaking a ton of code.  In any case, look
> also at the Emacs solution of an optional argument to drop all empty
> strings, with a weird behavior when no regexp is given...

In Scheme it is easy for the user to remove the empty strings if
desired.  In Perl I'd say that this at least involves writing a loop
each time, hence their choice for the default "nice" behaviour.

The ease of using `filter' is a good case for keeping the empty
strings in Scheme version.

I could not find any mention of this optional Emacs arg. you talk
about; have a pointer for me?



add regexp-split: a summary and new proposal

2011-12-30 Thread Daniel Hartwig
An attempt to summarize the pertinent points of the thread [1].

[1] http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00241.html

* Semantics, generally

  `regexp-split' is similar to `string-split'.  However, between
  various implementations the semantics vary over the following two
  points.  It is important to consider appropriate compatability with
  these other implementations whilst still offering the user a good
  set of functionality.

* Captured groups

  The Python [2] implementation contains unique semantics whereby the
  text of any captured groups in the pattern are included in the
  result:

  >>> re.split('\W+', 'Words, words, words.')
  ['Words', 'words', 'words', '']
  >>> re.split('(\W+)', 'Words, words, words.')
  ['Words', ', ', 'words', ', ', 'words', '.', '']

  This is considered useful functionality to have [3], though not
  necesarily by default.  Consider a simple parser [4] which will need
  access to the tokens for processing.

  Other implementations such as Racket [3], Chicken [5], and Perl do not
  return the captured groups in their result.

  If there were two separate functions (or one function with an
  optional argument controlling the output) then the user could have a
  single regexp perform both the task of just splitting and the task
  of extracting the tokens. [6]

  [2] http://docs.python.org/library/re.html#re.split
  [3] http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00257.html
  [4] http://80.68.89.23/2003/Oct/26/reSplit/
  [5] http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00249.html
  [6] http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00266.html

* Empty strings

  Some implementations (e.g. Chicken and Perl) drop (some) empty
  strings from their result.  In the case of Perl this is likely due
  to making things "nice" for the user in the majority case, but it is
  hard to revert this. [7]

  As per the example of `string-split', having empty strings in the
  result is useful to keep track of which "field" is which.

  In Scheme, if the empty strings are not desired, it is trivial to
  remove them:
   (filter (negate string-null?) lst)

  [7] http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00269.html

* Naming

  > Also, to me the name seems unintuitive -- it is STR being split, not
  > RE -- perhaps this can be folded in to the existing string-split
  > function.

  [8] http://lists.gnu.org/archive/html/guile-devel/2011-12/msg00245.html


Hopefully I have not missed out anything important :-)


Anyway, what do people think of this proposal which tries to address
that whole discussion:

* [Vanilla `string-split' expanded to support the CHAR_PRED
  semantics of `string-index' et al.]

* New function `string-explode' similar to `string-split' but returns
  the deliminators in it's result.

* Regex module replaces both of these with regexp-enhanced versions.

Thus:

scheme@(guile-user)> ;; with a char predicate
scheme@(guile-user)> (string-split "123+456*/" (negate char-numeric?))
$8 = ("123" "456" "" "")
scheme@(guile-user)> (string-explode "123+456*/" (negate char-numeric?))
$9 = ("123" "+" "456" "*" "" "/" "")
scheme@(guile-user)> ;; with a regular expression
scheme@(guile-user)> (use-modules (ice-9 regex))
scheme@(guile-user)> (define rx (make-regexp "([^0-9])"))
scheme@(guile-user)> (string-split "123+456*/" rx)
$10 = ("123" "456" "" "")
scheme@(guile-user)> ;; didn't want empty strings
scheme@(guile-user)> (filter (negate string-null?) $10)
$11 = ("123" "456")
scheme@(guile-user)> (string-explode "123+456*/" rx)
$12 = ("123" "+" "456" "*" "" "/" "")

and so on.

I'm happy to throw together a patch for the above, however, would like
some feedback first :-)


Regards



Re: [PATCH] add regexp-split

2011-12-30 Thread Eli Barzilay
Two hours ago, Daniel Hartwig wrote:
> 
> I could not find any mention of this optional Emacs arg. you talk
> about; have a pointer for me?

It's the last optional argument of `split-string'.  The last paragraph
in the documentation notes that when you call this function with just
a single argument, then that last optional flag is t -- which is an
unconventional thing for elisp functions...

(BTW, I'm subscribed to the list now, so this should go through.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!



Re: add regexp-split: a summary and new proposal

2011-12-30 Thread Eli Barzilay
An hour ago, Daniel Hartwig wrote:
> 
> Anyway, what do people think of this proposal which tries to address
> that whole discussion:
> 
> * [Vanilla `string-split' expanded to support the CHAR_PRED
>   semantics of `string-index' et al.]
> 
> * New function `string-explode' similar to `string-split' but returns
>   the deliminators in it's result.
> 
> * Regex module replaces both of these with regexp-enhanced versions.

Aha -- I was looking for a new name, and `-explode' sounds good and
not misleading like `-split' (misleading in that I wouldn't have
expected a "split" function to return stuff from the gaps).

But there's one more point that bugs me about the python thing: the
resulting list has both the matches and the non-matching gaps, and
knowing which is which is tricky.  For example, if you do this (I'll
use our syntax here, so note the minor differences):

  (define (foo rx)
(regexp-split rx "some string"))

then you can't tell which is which in its output without knowing how
many grouping parens are in the input regexp.  It therefore makes
sense to me to have this instead:

  > (regexp-explode #rx"([^0-9])" "123+456*/")
  '("123" ("+") "456" ("*") "" ("/") "")

and now it's easy to know which is which.  This is of course a simple
example with a single group so it doesn't look like much help, but
when with more than one group things can get confusing otherwise: for
example, in python you can get `None's in the result:

  >>> re.split('([^0-9](4)?)', '123+456*/')
  ['123', '+4', '4', '56', '*', None, '', '/', None, '']

but with the above, this becomes:

  > (regexp-explode #rx"([^0-9](4)?)" "123+456*/")
  '("123" ("+4" "4") "456" ("*" #f) "" ("/" #f) "")

so you can rely on the odd-numbered elements to be strings.  This is
probably going to be different for you, since you allow string
predicates instead of regexps.

Finally, the Racket implementation will probably be a little different
still -- our `regexp-match' returns a list with the matched substring
first, and then the matches for the capturing groups.  Following this,
a more uniform behavior for a `regexp-explode' would be to return
these lists, so we'd actually get:

  > (regexp-explode #rx"[^0-9]" "123+456*/")
  '("123" ("+") "456" ("*") "" ("/") "")
  > (regexp-explode #rx"([^0-9])" "123+456*/")
  '("123" ("+" "+") "456" ("*" "*") "" ("/" "/") "")

And again, this looks silly in this simple example, but would be more
useful in more complex ones.  We would also have a similar
`regexp-explode-positions' function that returns position pairs for
cases where you don't want to allocate all substrings.



One last not-too-related note: this is IMO all a by-product of a bad
choice of common regexp practices where capturing groups always refer
to the last match only.  In a world that would have made a better
choice, I'd expect:

  > (regexp-match #rx"(foo+)+ bar" "blah foof bar")
  '("foof bar" ("foo" "f"))

and, of course:

  > (regexp-match #rx"(fo(o)+)+ bar" "blah foof bar")
  '("foof bar" (("foo" ("o")) ("f" ("o" "o" "o"

But my guess is that many people wouldn't like that much...  (Probably
similar to disliking sexprs which are needed for the results of these
things.)  With such a thing, many of these additional constructs
wouldn't be necessary -- for exampe, we have `regexp-match*' that
returns all matches, and that wouldn't have been necessary.
`regexp-split' would probably not have been necessary too.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!



Re: Syntax Parameters documentation for guile

2011-12-30 Thread Eli Barzilay
More than a week ago, Ian Price wrote:
> 
> Eli,
> I'd especially appreciate it if you could clear up any misconceptions I
> may have, or may be unintentionally imparting on others.

(And that I'm kind of on the list I can reply...)

> * Syntax Parameters
> 
> Syntax parameters[fn:1] are a mechanism for rebinding a macro
> definition within the dynamic extent of a macro expansion. It
> provides a convenient solution to one of the most common types of
> unhygienic macro: those that introduce a special binding each time

I'd explicitly say "unhygienic" here rather than "special".


> the macro is used. Examples include an 'if' form that binds the
> result of the test to an 'it' binding, or class macros that
> introduce a special 'self' binding.

The `abort' example is also popular, probably even more than `it'.  I
think that there are practical uses of that (eg, a function with a
`return' keyword), whereas anaphoric conditionals are more of an
academic exercise that I don't think gets used in practice (at least
in Schemes).

[As a sidenote, when I worked on that paper I've asked our local Perl
guru about the problem of shadowing the implicit `it' in Perl -- he
said that in practice it's considered bad style to use it in perl
code, and referred me to some book that talks about the pitfalls of
using it...  I found it amusing that this perl-ism has become such a
popular example for unhygienic macros where perl hackers actually try
to avoid it.]


> With syntax parameters, instead of introducing the binding
> unhygienically each time, we instead create one binding for the
> keyword, which we can then adjust later when we want the keyword to
> have a different meaning. As no new bindings are introduced hygiene
> is preserved. This is similar to the dynamic binding mechanisms we
> have at run-time like parameters[fn:2] or fluids[fn:3].

An important note to add here is that there is no "dynamic scope" in
the usual sense here -- it's rather a dynamic scope during macro
expansion, and for macro-bound identifiers.  The resulting expanded
code is of course as lexical as always.  (We've had some discussions
at #scheme where this was a confusing point.)


> ** define-syntax-parameter keyword transformer [syntax]
> Binds keyword to the value obtained by evaluating transformer as a
> syntax-parameter.

The keyword is bound to the value of the `transformer' expression.
(Evaluated at the syntax level, in Racket's case, I don't know if
Guile has separate phases yet...)  It's not evaluated as a syntax
parameter, just like parameters.


> The transformer provides the default expansion for the syntax
> parameter, and in the absence of syntax-parameterize, is
> functionally equivalent to define-syntax.

A good note to add here is that it is usually bound to a transformer
that throws a syntax error like "`foo' must be used inside a `bar'".
It immediately clarifies the use of syntax parameters in the common
case.


> ** syntax-parameterize ((keyword transformer) ...) exp ... [syntax]
> (note, each keyword must be bound to a syntax-parameter 
> 
> Adjusts each of the keywords to use the value obtained by evaluating
> their respective transformer, in the expansion of the exp forms. It
> differs from let-syntax, in that the binding is not shadowed, but
> adjusted, and so uses of the keyword in the expansion of exp forms
> use the new transformers.

A possibly useful analogy is with `fluid-let' which doesn't create new
bindings, but rather `set!'s them.  But IMO `fluid-let' should die, so
using parameters is a better example...

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!