On Wed, Feb 19, 2014 at 8:16 AM, Matthew Eric Bassett <[email protected]> wrote: > Hello fellow racketeers, > > So we have a few web crawlers written in Racket that for the most part work > quite well - however, we often have them running behind our corporate proxy. > We were quite bemused today when one stopped working when it found an https > link. > > A quick check of the docs revealed: > >>> (current-proxy-servers) >>> → (listof (list/c string? string? (integer-in 0 65535))) >>> (current-proxy-servers mapping) → void? >>> mapping : (listof (list/c string? string? (integer-in 0 65535))) >>> >>> A parameter that determines a mapping of proxy servers used for >>> connections. Each mapping is a list of three elements: >>> * the URL scheme, such as "http"; >>> * the proxy server address; and >>> * the proxy server port number. >>> >>> Currently, **the only proxiable scheme is "http"**. The default mapping >>> is the empty list (i.e., no proxies). > > > Are there any plans for a proxy implementation for https?
As far as I know, there are no plans. That code has not been touched in a long time and the last time I touched it, a major thing that I did was make the proxy implementation more decoupled from the HTTP connection code. > Or any major > obstacles preventing such? I know very little about the dark innerworkings > of ssl. I believe that main challenge is that proxying defeats the no-eavesdropping promises of HTTPS by definition. This page about Squid (a popular proxy) talks about some of the problems and how it deals with them: http://wiki.squid-cache.org/Features/SslBump The key that would need to be implemented is "HTTP CONNECT": http://en.wikipedia.org/wiki/HTTP_tunnel#HTTP_CONNECT_Tunneling In our code, that would mean that these functions https://github.com/plt/racket/blob/master/racket/collects/net/url.rkt#L117 https://github.com/plt/racket/blob/master/racket/collects/net/url.rkt#L130 would need to see (a) a proxy is around and (b) the url's scheme is HTTPS, and then send a CONNECT request and set things up transparently to the rest of the Racket code. It is totally do-able and my guess is that it would be less than 50 lines of code to change. My worry is that it would be a beast to test, as I don't know how reliable the RFC is on this matter. I'm willing to help, but would prefer if it were easy to get a high-level test of the proxy and https site that you were working with. Jay > Thanks, > > Matthew Eric > > > > -- > Matthew Eric Bassett | http://mebassett.info > > ____________________ > Racket Users list: > http://lists.racket-lang.org/users -- Jay McCarthy <[email protected]> Assistant Professor / Brigham Young University http://faculty.cs.byu.edu/~jay "The glory of God is Intelligence" - D&C 93 ____________________ Racket Users list: http://lists.racket-lang.org/users

