Some less good news: I found out that the https stuff is not working right for all sites. I tested though... the code works *before* I wrapped it in custom-binary-input/output-port.
After being wrapped though, strange things happen. For some sites (eg "https://webmention.net/") things seem fine: scheme@(guile-user)> (http-get (string->uri "https://webmention.net/")) $7 = #<<response> version: (1 . 1) code: 200 reason-phrase: "OK" headers: ((server . "nginx/1.9.10") (date . #<date nanosecond: 0 second: 46 minute: 15 hour: 18 day: 6 month: 11 year: 2016 zone-offset: 0>) (content-type text/html (charset . "UTF-8")) (transfer-encoding (chunked)) (connection close) (x-powered-by . "PHP/5.6.21-1+donate.sury.org~trusty+4")) port: #<closed: file f25310>> $8 = "<!DOCTYPE html>\n<html>\n<head>\n <title>Webmention</title>\n <link rel=\"stylesheet\" href=\"/styles.css\">\n</head>\n<body>\n\n<div class=\"page\">\n \n <h1>Webmention</h1>\n \n <div class=\"subtitle\">Webmention is a simple way to notify any URL when you link to it from your site.</div>\n \n <ul class=\"links\">\n <li>The Webmention specification is being developed under the <a href=\"https://www.w3.org/wiki/Socialwg\">W3C Social Web Working Group</a>.</li>\n <li class=\"main\"><a href=\"https://www.w3.org/TR/webmention/\">Latest published version</a></li>\n <li class=\"main\"><a href=\"http://webmention.net/draft/\">Latest editor's draft</a></li>\n <li class=\"main\"><a href=\"http://webmention.net/implementations/\">Implementations</a></li>\n <li>The specification was contributed to the W3C by the IndieWeb community. More information and history of the spec can be found on the <a href=\"https://indieweb.org/webmention\">IndieWeb wiki</a>.</li>\n </ul>\n \n</div>\n\n</body>\n</html>" For other sites, especially ones where the pages are larger, things are broken. For example, let's try to pull down the site of friend Joey Hess: scheme@(guile-user)> (http-get (string->uri "https://joeyh.name/")) $9 = #<<response> version: (1 . 1) code: 200 reason-phrase: "OK" headers: ((date . #<date nanosecond: 0 second: 58 minute: 15 hour: 18 day: 6 month: 11 year: 2016 zone-offset: 0>) (server . "Apache/2.4.10 (Debian)") (last-modified . #<date nanosecond: 0 second: 43 minute: 34 hour: 14 day: 24 month: 10 year: 2016 zone-offset: 0>) (etag "195c-53f9d4af683f3" . #t) (accept-ranges bytes) (content-length . 6492) (vary accept-encoding) (cache-control (max-age . 0)) (expires . #<date nanosecond: 0 second: 58 minute: 15 hour: 18 day: 6 month: 11 year: 2016 zone-offset: 0>) (connection close) (content-type text/html)) port: #<closed: file f4c070>> $10 = "moz-background-size: cover;\n -o-background-size: cover;\n background-size: cover;\n}\n.sidebar {\n background: none;\n border: none;\n}\ninput#searchbox {\n display: none;\n}\n#pageinfo {\n display: none;\n}\n.pageheader .actions ul {\n border-bottom: none;\n}\n#pagebody {\n margin-left: 20%;\n}\n.archivepagedate {\n font-size: 0.5em;\n}\n.actions {\n display: none;\n}\n</style>\n</div>\n\n\n\n\n<table>\n<tr>\n<td width=\"33%\" valign=top><h3>personal</h3>\n\n<p><a href=\"./blog/\">blog</a><br/>\n<a href=\"./pics/\">pics</a><br/>\n<a href=\"./contact/\">contact me</a><br/>\n<a href=\"./todo/\">todo</a></p>\n\n\n\n</td>\n<td width=\"33%\" valign=top><h3>technical</h3>\n\n<p><a href=\"./code/\">code</a><br/>\n<a href=\"./vcshome/\">vcshome</a><br/>\n<a href=\"./talks/\">talks</a><br/>\n<a href=\"./screencasts/\">screencasts</a><br/>\n<a href=\"./termcast/\">termcasts</a><br/>\n<a href=\"./rfc/\">rfcs</a><br/>\n<a href=\"./boxen/\">boxen</a></p>\n\n\n\n</td>\n<td width=\"33%\" valign=top><h3>fun</h3>\n\n<p><a href=\"./learnstofly/\">Joey Learns to Fly</a><br/>\n<a href=\"http://olduse.net/\">oldusenet</a><br/>\n<a href=\"./languages/\">languages</a><br/>\n<a href=\"./yurt/\">yurt</a><br/>\n<a href=\"./caving/\">caving</a><br/>\n<a href=\"./grep/\">grep</a><br/>\n<a href=\"./meta/\">meta</a></p>\n\n\n\n</td>\n</tr>\n</table>\n\n\n\n\n<h3>interviews</h3>\n\n<p><a href=\"http://joey.hess.usesthis.com\">2012: The Setup</a></p>\n\n<blockquote><p>\"When power is low, I often hack in the evenings by lantern light.\"</p></blockquote>\n\n<p><a href=\"http://zgrimshell.github.io/posts/interviews-with-floss-developers-joey-hess.html\">2015: Life after Debian</a></p>\n\n<blockquote><p>\"I want to build worthwhile things that might last.\"</p></blockquote>\n\n<p><a href=\"http://lwn.net/Articles/672352/\">2016: Linux Weekly News</a></p>\n\n<blockquote><p>\"I still see myself as a beginner, and certainly not an exemplar.\"</p></blockquote>\n\n\n\n\n\n\n</section>\n\n\n\n\n\n\n\n</div>\n\n<footer id=\"footer\" class=\"pagefooter\" role=\"contentinfo\">\n\n<nav id=\"pageinfo\">\n\n\n\n\n\n\n\n\n\n\n\n<div class=\"pagedate\">\nLast edited <time datetime=\"2015-03-02T15:14:09Z\" class=\"relativedate\" title=\"Mon, 02 Mar 2015 10:14:09 -0500\">mid-morning Monday, March 2nd, 2015</time>\n<!-- Created <time datetime=\"2006-03-19T23:58:19Z\" pubdate=\"pubdate\" class=\"relativedate\" title=\"Sun, 19 Mar 2006 18:58:19 -0500\">Sunday evening, March 19th, 2006</time> -->\n</div>\n\n</nav>\n\n\n<!-- from joey -->\n</footer>\n\n</article>\n\n</body>\n</html>\n\" title=\"Thu, 22 Sep 2016 16:13:21 -0400\">at teatime on Thursday, September 22nd, 2016</time>\n\n</span>\n</div>\n<div class=\"archivepage\">\n\n<a href=\"./blog/entry/PoW_bucket_bloom/\">PoW bucket bloom: throttling anonymous clients with proof of work, token buckets, and bloom filters</a><br />\n\n<span class=\"archivepagedate\">\nPosted <time datetime=\"2016-09-13T05:14:47Z\" pubdate=\"pubdate\" class=\"relativedate\" title=\"Tue, 13 Sep 2016 01:14:47 -0400\">late Monday night, September 13th, 2016</time>\n\n</span>\n</div>\n<div class=\"archivepage\">\n\n<a href=\"./blog/entry/late_summer/\">late summer</a><br />\n\n<span class=\"archivepagedate\">\nPosted <time datetime=\"2016-08-31T01:15:40Z\" pubdate=\"pubdate\" class=\"relativedate\" title=\"Tue, 30 Aug 2016 21:15:40 -0400\">late Tuesday evening, August 30th, 2016</time>\n\n</span>\n</div>\n\n\n\n\n\n</aside>\n\n\n\n<div id=\"pagebody\">\n\n<section id=\"content\" role=\"main\">\n\n\n<div>\n<style>\nhtml { \n background: url(joeykite.jpg) no-repeat center center fixed; \n -webkit-background-size: cover;\n -\x00r\x00\x00\x00e\x00\x00\x00t\x00\x00\x00u\x00\x00\x00r\x00\x00\x00n\x00\x00\x00e\x00\x00\x00d\x00\x00\x00 \x00\x00\x00a\x00\x00\x00s\x00\x00\x00 \x00\x00\x00a\x00\x00\x00 \x00\x00\x00b\x00\x00\x00y\x00\x00\x00t\x00\x00\x00e\x00\x00\x00v\x00\x00\x00e\x00\x00\x00c\x00\x00\x00t\x00\x00\x00o\x00\x00\x00r\x00\x00\x00.\x00\x00\x00\n\x00\x00\x00\n\x00\x00\x00H\x00\x00\x00o\x00\x00\x00w\x00\x00\x00e\x00\x00\x00v\x00\x00\x00e\x00\x00\x00r\x00\x00\x00,\x00\x00\x00 \x00\x00\x00i\x00\x00\x00f\x00\x00\x00 \x00\x00\x00S\x00\x00\x00T\x00\x00\x00R\x00\x00\x00E\x00\x00\x00A\x00\x00\x00M\x00\x00\x00I\x00\x00\x00N\x00\x00\x00G\x00\x00\x00?\x00\x00\x00 \x00\x00\x00i\x00\x00\x00s\x00\x00\x00 \x00\x00\x00t\x00\x00\x00r\x00\x00\x00u\x00\x00\x00e\x00\x00\x00,\x00\x00\x00 \x00\x00\x00i\x00\x00\x00n\x00\x00\x00s\x00\x00\x00t\x00\x00\x00e\x00\x00\x00a\x00\x00\x00d\x00\x00\x00 \x00\x00\x00o\x00\x00\x00f\x00\x00\x00 \x00\x00\x00e\x00\x00\x00a\x00\x00\x00g\x00\x00\x00e\x00\x00\x00r\x00\x00\x00l\x00\x00\x00y\x00\x00\x00 \x00\x00\x00r\x00\x00\x00e\x00\x00\x00a\x00\x00\x00d\x00\x00\x00i\x00\x00\x00n\x00\x00\x00g\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00e\x00\x00\x00 \x00\x00\x00r\x00\x00\x00e\x00\x00\x00s\x00\x00\x00p\x00\x00\x00o\x00\x00\x00n\x00\x00\x00s\x00\x00\x00e\x00\x00\x00\n\x00\x00\x00b\x00\x00\x00o\x00\x00\x00d\x00\x00\x00y\x00\x00\x00 \x00\x00\x00f\x00\x00\x00r\x00\x00\x00o\x00\x00\x00m\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00e\x00\x00\x00 \x00\x00\x00s\x00\x00\x00e\x00\x00\x00r\x00\x00\x00v\x00\x00\x00e\x00\x00\x00r\x00\x00\x00,\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00i\x00\x00\x00s\x00\x00" (truncated a bit) First of all, the response body starts in the wrong place... it should start with "<!DOCTYPE html>". Then, somewhere in the middle it switches to garbage output. I'm not sure why. Again, it's fine before being wrapped in the custom-binary-input/output-port. So either it's my fault (could well be) or there's a bug in the custom-binary-input/output-port implementation. I feel like I don't know enough to be sure. I would assume it's on my end, but since I think this is the first major use of that interface, a bug seems hardly impossible. Anyway, to test this bug you'll need to have gnutls compiled with a newer Guile. I've attached the hacky guix package I'm using to test this. Then you'll want to do: $ guix environment --ad-hoc gnutls-with-guile-next guile-next (You need guile-next even if doing gnutls-with-guile-next in the environment to enable the Guile 2.2 paths.) I could use some help on this... I'm afraid that if I've done something wrong, I'm not knowledgeable enough to know how to get out of the problem.
>From 91c0a4a728ca4bf2e9468cdc849c350dd3f7380f Mon Sep 17 00:00:00 2001 From: Christopher Allan Webber <cweb...@dustycloud.org> Date: Thu, 17 Sep 2015 15:14:54 -0500 Subject: [PATCH] web: Add https support through gnutls. Since importing gnutls directly would result in a dependency cycle, we load gnutls lazily. This uses code originally written for Guix by Ludovic * module/web/client.scm: (%http-receive-buffer-size) (warn-no-gnutls-return-false, gnutls-module, ensure-gnutls) (gnutls-ref, tls-wrap): New variables. (open-socket-for-uri): Wrap in tls when uri scheme is https. * doc/ref/web.texi (open-socket-for-uri): Document gnutls usage. --- doc/ref/web.texi | 6 +- module/web/client.scm | 175 +++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 158 insertions(+), 23 deletions(-) diff --git a/doc/ref/web.texi b/doc/ref/web.texi index becdc28..c2f3f61 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -1422,7 +1422,11 @@ the lower-level HTTP, request, and response modules. @end example @deffn {Scheme Procedure} open-socket-for-uri uri -Return an open input/output port for a connection to URI. +Return an open input/output port for a connection to URI. Guile +dynamically loads gnutls for https support; for more information, see +@xref{Guile Preparations, +how to install the GnuTLS bindings for Guile,, gnutls-guile, +GnuTLS-Guile}. @end deffn @deffn {Scheme Procedure} http-get uri arg... diff --git a/module/web/client.scm b/module/web/client.scm index f24a4d7..f0fba49 100644 --- a/module/web/client.scm +++ b/module/web/client.scm @@ -1,6 +1,6 @@ ;;; Web client -;; Copyright (C) 2011, 2012, 2013, 2014 Free Software Foundation, Inc. +;; Copyright (C) 2011, 2012, 2013, 2014, 2015, 2016 Free Software Foundation, Inc. ;; This library is free software; you can redistribute it and/or ;; modify it under the terms of the GNU Lesser General Public @@ -43,8 +43,11 @@ #:use-module (srfi srfi-1) #:use-module (srfi srfi-9) #:use-module (srfi srfi-9 gnu) + #:use-module ((rnrs io ports) + #:prefix rnrs-ports:) #:export (current-http-proxy open-socket-for-uri + open-connection-for-uri http-get http-get* http-head @@ -54,11 +57,113 @@ http-trace http-options)) +(define %http-receive-buffer-size + ;; Size of the HTTP receive buffer. + 65536) + +;; Provide access to the gnutls-module, but fail gracefully if not available. +;; Why take this route and not just straight up import the module? +;; Guile can't depend on gnutls because gnutls includes Guile as a dependency. +;; There's some risk of dependency cycles, so lazily resolving things only +;; once needed helps! + +(define warn-no-gnutls-return-false + (lambda _ + (format (current-error-port) + "warning: (gnutls) module not available\n") + #f)) + +(define gnutls-module + (delay + (catch 'misc-error + (lambda () + (let ((module (resolve-interface '(gnutls)))) + ;; In some 2.1/2.2 installations installed alongside Guile 2.0, gnutls + ;; can be imported but the bindings are broken as "unknown type". + ;; Here we check that gnutls-version is the right type (a procedure) + ;; to make sure the bindings are ok. + (if (procedure? (module-ref module 'gnutls-version)) + module + (warn-no-gnutls-return-false)))) + warn-no-gnutls-return-false))) + +(define (ensure-gnutls) + (if (not (force gnutls-module)) + (throw 'gnutls-not-available "(gnutls) module not available"))) + +(define (gnutls-ref symbol) + "Fetch method-symbol from the gnutls module" + (module-ref (force gnutls-module) symbol)) + (define current-http-proxy (make-parameter (let ((proxy (getenv "http_proxy"))) (and (not (equal? proxy "")) proxy)))) +(define (tls-wrap port server) + "Return PORT wrapped in a TLS connection to SERVER. SERVER must be a DNS +host name without trailing dot." + (define (log level str) + (format (current-error-port) + "gnutls: [~a|~a] ~a" (getpid) level str)) + + (ensure-gnutls) + + (let ((session ((gnutls-ref 'make-session) + (gnutls-ref 'connection-end/client)))) + + ;; Some servers such as 'cloud.github.com' require the client to support + ;; the 'SERVER NAME' extension. However, 'set-session-server-name!' is + ;; not available in older GnuTLS releases. See + ;; <http://bugs.gnu.org/18526> for details. + (if (module-defined? (force gnutls-module) + 'set-session-server-name!) + ((gnutls-ref 'set-session-server-name!) + session (gnutls-ref 'server-name-type/dns) server) + (format (current-error-port) + "warning: TLS 'SERVER NAME' extension not supported~%")) + + ((gnutls-ref 'set-session-transport-fd!) session (fileno port)) + ((gnutls-ref 'set-session-default-priority!) session) + + ;; The "%COMPAT" bit allows us to work around firewall issues (info + ;; "(gnutls) Priority Strings"); see <http://bugs.gnu.org/23311>. + ;; Explicitly disable SSLv3, which is insecure: + ;; <https://tools.ietf.org/html/rfc7568>. + ((gnutls-ref 'set-session-priorities!) session "NORMAL:%COMPAT:-VERS-SSL3.0") + + ((gnutls-ref 'set-session-credentials!) session + ((gnutls-ref 'make-certificate-credentials))) + + ;; Uncomment the following lines in case of debugging emergency. + ;;(set-log-level! 10) + ;;(set-log-procedure! log) + + ((gnutls-ref 'handshake) session) + (let ((record ((gnutls-ref 'session-record-port) session))) + (define (read! bv start count) + (define read-bv (get-bytevector-n record count)) + (if (eof-object? read-bv) + 0 ; read! returns 0 on eof-object + (let ((read-bv-len (bytevector-length read-bv))) + (bytevector-copy! read-bv 0 bv 0 read-bv-len) + read-bv-len))) + (define (write! bv start count) + (put-bytevector record bv start count) + count) + (define (get-position) + (rnrs-ports:port-position record)) + (define (set-position! new-position) + (rnrs-ports:set-port-position! record new-position)) + (define (close) + (unless (port-closed? port) + (close-port port)) + (unless (port-closed? record) + (close-port record))) + (make-custom-binary-input/output-port "gnutls wrapped port" read! write! + get-position set-position! + close)))) + (define (ensure-uri uri-or-string) (cond ((string? uri-or-string) (string->uri uri-or-string)) @@ -81,27 +186,53 @@ 0)) (lambda (ai1 ai2) (equal? (addrinfo:addr ai1) (addrinfo:addr ai2)))))) - - (let loop ((addresses addresses)) - (let* ((ai (car addresses)) - (s (with-fluids ((%default-port-encoding #f)) - ;; Restrict ourselves to TCP. - (socket (addrinfo:fam ai) SOCK_STREAM IPPROTO_IP)))) - (catch 'system-error - (lambda () - (connect s (addrinfo:addr ai)) - - ;; Buffer input and output on this port. - (setvbuf s 'block) - ;; If we're using a proxy, make a note of that. - (when http-proxy (set-http-proxy-port?! s #t)) - s) - (lambda args - ;; Connection failed, so try one of the other addresses. - (close s) - (if (null? (cdr addresses)) - (apply throw args) - (loop (cdr addresses)))))))) + (define https? + (eq? 'https (uri-scheme uri))) + (define (open-socket) + (let loop ((addresses addresses)) + (let* ((ai (car addresses)) + (s (with-fluids ((%default-port-encoding #f)) + ;; Restrict ourselves to TCP. + (socket (addrinfo:fam ai) SOCK_STREAM IPPROTO_IP)))) + (catch 'system-error + (lambda () + (connect s (addrinfo:addr ai)) + + ;; Buffer input and output on this port. + (setvbuf s 'block) + ;; If we're using a proxy, make a note of that. + (when http-proxy (set-http-proxy-port?! s #t)) + s) + (lambda args + ;; Connection failed, so try one of the other addresses. + (close s) + (if (null? (cdr addresses)) + (apply throw args) + (loop (cdr addresses)))))))) + + (let-syntax ((with-https-proxy + (syntax-rules () + ((_ exp) + ;; For HTTPS URIs, honor 'https_proxy', not 'http_proxy'. + ;; FIXME: Proxying is not supported for https. + (let ((thunk (lambda () exp))) + (if (and https? + current-http-proxy) + (parameterize ((current-http-proxy #f)) + (when (and=> (getenv "https_proxy") + (negate string-null?)) + (format (current-error-port) + "warning: 'https_proxy' is ignored~%")) + (thunk)) + (thunk))))))) + (with-https-proxy + (let ((s (open-socket))) + ;; Buffer input and output on this port. + (setvbuf s _IOFBF %http-receive-buffer-size) + + (if https? + (tls-wrap s (uri-host uri)) + s))))) (define (extend-request r k v . additional) (let ((r (set-field r (request-headers) -- 2.10.2