José,

Very good. As you suggest, allowing for the manual creation of URI structs is 
the only *strictly* required thing on my wish list - everything else can be 
done externally. 

I will build out the validation & normalization logic in a standalone library 
removed from Bandit, as I still do believe that the URI module is the correct 
place for this logic. Perhaps we can revisit this once I’ve had a chance to 
shake out the API structure & refined the various use cases. 

I’ll cut a PR against elixir-lang/elixir to update URI's documentation as you 
suggest.

Thanks again!

m.


> On Feb 21, 2022, at 3:54 PM, José Valim <[email protected]> wrote:
> 
> I see, in your case then it sounds like you running your own custom 
> validation is the best, because URI can't provide it out of the box. So it 
> seems creating from the %URI{...} is the best option. We can document it is 
> possible but not to set the deprecated authority field.
> 
> 
> José Valim
> https://dashbit.co/ <https://dashbit.co/>
> 
> On Mon, Feb 21, 2022 at 9:44 PM Mat Trudel <[email protected] 
> <mailto:[email protected]>> wrote:
> Jose,
> 
> You’re correct insofar as the various components in an HTTP request all come 
> from well defined sources (with the possible exception of determining the 
> hostname of a request, which is a bit tricky). What isn’t so obvious, 
> however, is how these may be combined by bad actors to create undesired 
> request URIs. There are a number of attack vectors which can exploit server 
> URI parsing as a basis for further downstream exploits (see [1], [2], [3]).
> 
> My planned approach to manage this in Bandit is to build URIs is roughly as 
> follows
> 
> 1. Figure out the scheme used for the request - from the perspective of 
> Bandit, this is either http or https depending on the underlying transport. 
> Situations where this may be overridden by forwarding proxies including `X-` 
> headers are explicitly outside the scope of Bandit; we’re only concerned 
> about explicit HTTP semantics.
> 
> 2. Determine the hostname & port used for the request (by consulting a 
> specific list of sources in Host headers, authority pseudo headers, and other 
> sources). Construct a URI from scheme, host & port & normalize it. Validate 
> that the resulting path is “/“ and that the query string is empty.
> 
> 3. Determine the path & query string from the request by analyzing the 
> request line / path pseudo header. Construct a URI from this & normalize it. 
> Validate that the resulting scheme, host & port are empty.
> 
> 4. Merge these two URIs together resulting in one where all fields are known 
> to come from specific sources as above.
> 
> In truth I suspect that the full answer here is no doubt a lot longer more 
> nuanced than I’m able to appreciate. My (possibly naive) hope here is to be 
> able to apply some well-defined heuristics to build & normalize a request as 
> early as possible in the request lifecycle, so as to ensure that Plug users 
> can rely on their request parameters at least being valid & sanitized at a 
> protocol level.
> 
> In terms of specific validations, I would propose that each field be 
> validated against the grammars defined in RFC 3986 [4]. Concerning 
> normalization heuristics, a number are described in section 6 of the same 
> RFC, though I can think of a few others which would likely be good to 
> include. Specific normalization heuristics used should be called out in 
> documentation.
> 
> The question of whether we would want to expose validation and normalization 
> as discrete functions against a URI isn’t one I have a strong opinion on. My 
> hunch here is that there is probably a wide variety of expectations here 
> varying on use cases so it’s probably better to leave them separate.
> 
> m.
> 
> 
> [1] 
> https://samcurry.net/abusing-http-path-normalization-and-cache-poisoning-to-steal-rocket-league-accounts/
>  
> <https://samcurry.net/abusing-http-path-normalization-and-cache-poisoning-to-steal-rocket-league-accounts/>
> [2] 
> https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf
>  
> <https://i.blackhat.com/USA-19/Thursday/us-19-Birch-HostSplit-Exploitable-Antipatterns-In-Unicode-Normalization.pdf>
> [3] https://community.cloudflare.com/t/faq-url-normalization/259183 
> <https://community.cloudflare.com/t/faq-url-normalization/259183>
> [4] https://datatracker.ietf.org/doc/html/rfc3986 
> <https://datatracker.ietf.org/doc/html/rfc3986>
> 
> 
>> On Feb 20, 2022, at 6:02 AM, José Valim <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Mat, thanks for starting this discussion!
>> 
>> Quick question: don't you want to normalize the URI? I assume they already 
>> have to follow a strict format in the HTTP case that is ready to use as is. 
>> So doing any sort of normalization would be additional work. We could 
>> perform some minimal validation but, if so, what should it be?
>> 
>> 
>> On Fri, Feb 18, 2022 at 6:29 PM Mat Trudel <[email protected] 
>> <mailto:[email protected]>> wrote:
>> When implementing an HTTP server, one of the most unspecified parts of 
>> handling a request is the building and canonicalization of the requested 
>> URI. The constituent parts of a request URI are spread out across multiple 
>> sources. For example, the hostname of a request can be any of (possibly 
>> multiple!) Host header(s), an authority pseudo-header in HTTP/2, a 
>> statically configured value for IP-based hosting, or even something derived 
>> from upstream X- headers. Assembling these parts into a canonical request 
>> URI is non-trivial.
>> 
>> The URI module as currently implemented does not provide supported ways to 
>> construct a URI from constituent parts (though that is changing [1] ). Nor 
>> does it provide methods to validate or meaningfully normalize an extant URI 
>> struct. Without these methods, HTTP servers need to resort to adhoc methods 
>> to build and canonicalize request URIs (see [2], [3]). 
>> 
>> To help alleviate this, it is proposed to add the following changes to the 
>> URI module:
>> 
>> 1. Explicitly allow for the building of URI structs directly in the module 
>> documentation (subject to warnings about the use of the authority field).
>> 
>> 2. Add a normalize(%{})/2 function which will return a normalized version of 
>> an existing URI struct (this can plumb through to :uri_string.normalize/2 
>> [4]).
>> 
>> 3. Add an absolute?/1 function which returns whether or not the URI is 
>> absolute (that is, does it contain sufficient information to discretely 
>> represent a complete, unambiguous request)
>> 
>> Along with the existing new/1 and merge/2 functions, I believe that this 
>> should be sufficient to cleanly implement request URI construction within a 
>> web server such as Bandit. This will allow the web server to determine where 
>> to source the various components of a URI from, while deferring assembly, 
>> normalization and validation of those components to the URI module where it 
>> belongs.
>> 
>> Subject to debate and approval I'm happy to work this up.
>> 
>> m.
>> 
>> [1] https://twitter.com/josevalim/status/1494208355732275200 
>> <https://twitter.com/josevalim/status/1494208355732275200>
>> [2] 
>> https://github.com/mtrudel/bandit/blob/main/lib/bandit/http2/stream_task.ex#L101-L113
>>  
>> <https://github.com/mtrudel/bandit/blob/main/lib/bandit/http2/stream_task.ex#L101-L113>
>> [3] 
>> https://github.com/ninenines/cowboy/blob/8795233c57f1f472781a22ffbf186ce38cc5b049/src/cowboy_http.erl#L490-L553
>>  
>> <https://github.com/ninenines/cowboy/blob/8795233c57f1f472781a22ffbf186ce38cc5b049/src/cowboy_http.erl#L490-L553>
>> [4] https://www.erlang.org/doc/man/uri_string.html#normalize-2 
>> <https://www.erlang.org/doc/man/uri_string.html#normalize-2>
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/8c4e9d5d-f83a-43dc-82e7-171730f19724n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/8c4e9d5d-f83a-43dc-82e7-171730f19724n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elixir-lang-core" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe 
>> <https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe>.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] 
>> <mailto:[email protected]>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KcmuJNyOtc2DQ-LNuaMM1phMrpiHG7f2%3DP-3T2WrconQ%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KcmuJNyOtc2DQ-LNuaMM1phMrpiHG7f2%3DP-3T2WrconQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elixir-lang-core/25C16A74-ADC7-4C84-AEF2-387B91EBF262%40geeky.net
>  
> <https://groups.google.com/d/msgid/elixir-lang-core/25C16A74-ADC7-4C84-AEF2-387B91EBF262%40geeky.net?utm_medium=email&utm_source=footer>.
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elixir-lang-core" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe 
> <https://groups.google.com/d/topic/elixir-lang-core/hhFq9a1Xuuw/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Bqvh%3DqyNMvBZ7bOfOCRVJV2rC5rYHFCVP-2G2xxaGUNQ%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2Bqvh%3DqyNMvBZ7bOfOCRVJV2rC5rYHFCVP-2G2xxaGUNQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/C1A59A3A-C143-435B-BEBA-DD5FAFD33BD5%40geeky.net.

Reply via email to