The problem arises on the proxy request, not the inbound request. TSUrlRawSchemeGet should return an empty string for the first case.
On Tue, Oct 6, 2020 at 3:22 PM Walt Karas <wka...@verizonmedia.com.invalid> wrote: > So, when I call the current TSUrlSchemeGet() for these two requests: > > printf "GET / HTTP/1.1\r\nHost: mYhOsT.teSt:61000\r\n\r\n" | nc localhost > 61001 > printf "GET > https://urldefense.proofpoint.com/v2/url?u=http-3A__mYhOsT.teSt-3A61000_&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=5nE_8e-Jc1t5vF6GVeub9BCN4FzSc_6kU7_mjSiUrDs&m=tHbcxwP-7R4CDqWxQtN0lU5SU0iqTJtPosNC-K_tQ6Q&s=oi7IufY5vsl5r1JR9oGUyIHWUsG5ANRXj-SZXx67r2g&e= > HTTP/1.1\r\n\r\n" | nc localhost 61001 > > I get "http" for both. Does that mean it's already working as desired? > Should TSUrlRawSchemeGet() return an empty string for the first request? > > On Wed, Sep 30, 2020 at 11:14 AM Alan Carroll > <solidwallofc...@verizonmedia.com.invalid> wrote: > > > There has been a lot of discussion on this and the related change for the > > URL port. You can see some of this on the ASF slack channel, but I will > > summarize in this note. > > > > Leif objected to this change and the current compromise is to > > > > 1. Change the current TSUrlSchemeGet to do what was proposed for > > TSHttpHdrSchemeGet, that is if the scheme is not literally in the URL, > the > > value in the internal member is used to return the WKS for the value. > > 2. Add TSUrlRawSchemeGet which does what TSUrlSchemeGet does now. > > > > In practice, this would be renaming TSUrlSchemeGet to TSUrlRawSchemeGet, > > and add TSUrlSchemeGet to do the "clever" thing. > > > > In the same way, do this for TSUrlPortGet and TSUrlRawPortGet. > > > > In essence TSUrlSchemeGet and TSUrlPortGet return the effective value, > and > > the "Raw" variants get the literal value. > > > > This may need to be updated in the future for HTTP/2 outbound where the > > scheme can be set in an HTTP/2 field. OTOH if it is a field that could be > > handled the same way as for "Host" currently, which is by leaving the. > > burden of checking that to the plugin. > > > > On Tue, Sep 29, 2020 at 9:38 AM Alan Carroll < > > solidwallofc...@verizonmedia.com> wrote: > > > > > I don't see how this would depend on a cache hit or miss. If two > requests > > > map to the same object, that's the cache key, not the request scheme. > > This > > > returns the scheme in hdr->m_http->req.m_url_impl.scheme or > > > hdr->m_http->req.m_url_impl.m_url_type if the former is nullptr. > > > > > > The point here is to provide access to data that is in the core but not > > > currently available to a plugin, that is > > > hdr->m_http->req.m_url_impl.m_url_type. Consider the case where a user > > > agent sends a request for " > > > https://urldefense.com/v3/__http://delain.nl/lucidity.html__;!!Op6eflyXZCqGR5I!SNVEMa63RJNusBdmJy0FRUobklIbRbCFbc9t2EuMlXaoYK8z_k0IRNAhMAOGT8S8HA$ > > ". When the > > > proxy request is created, it will have only "lucidity.html" in the > > request > > > URL. Yet, unless the scheme was explicitly changed via a plugin or > remap, > > > the core still knows it's an HTTP request. But how could a plugin know? > > > TSUrlSchemeGet will return a nullptr. In this case, however, > > > TSHttpHdrSchemeGet would return "http". > > > > > > This is very similar to TSHttpHdrHostGet, and is useful for the same > > > reasons. > > > > > > > > > On Mon, Sep 28, 2020 at 9:13 PM Leif Hedstrom <zw...@apache.org> > wrote: > > > > > >> Also what’s the semantic here when both http:// and https:// URLs > map > > >> to the same cached object ? The first cached request specifies the > > scheme? > > >> This seems confusing at best... or are we talking about the scheme as > it > > >> goes to origin (which would have to be the same for both). > > >> > > >> Seems like a remap plugin could just look at the FromURL (or ToURL) > > which > > >> should have the scheme, rather than the cached data. And no new APIs > > >> needed. For a global plugins it’s less obvious, but same issues o > think? > > >> > > >> — Leif > > >> > > >> > On Sep 28, 2020, at 20:05, Leif Hedstrom <zw...@apache.org> wrote: > > >> > > > >> > The point here being to make a new API that replaces the old, > without > > >> breaking compatibility? And this new API has special semantics on a > > cache > > >> hit vs cache miss? > > >> > > > >> > This seems pretty convoluted, making it difficult for plugin writers > > to > > >> use the right API... > > >> > > > >> > — Leif > > >> > > > >> >> On Sep 28, 2020, at 19:49, Brian Neradt <brian.ner...@gmail.com> > > >> wrote: > > >> >> > > >> >> +1 > > >> >> > > >> >> Traffic Dump can make use of this. > > >> >> > > >> >>> On Mon, Sep 28, 2020 at 7:38 PM Walt Karas < > wka...@verizonmedia.com > > >> .invalid> > > >> >>> wrote: > > >> >>> > > >> >>> This should get the scheme for the request. This differs from > > >> >>> `TSUrlSchemeGet` in that it gets the scheme even if it is not in > the > > >> URL of > > >> >>> the request. For most proxy requests, the ATS core will remove the > > >> host and > > >> >>> scheme in the request while tracking it internally. In such a > case a > > >> plugin > > >> >>> cannot discover that information, a problem this API would fix. > > >> >>> > > >> >>> If the scheme is in the request URL, return that. Otherwise > return a > > >> scheme > > >> >>> that corresponds to the internally stored scheme. > > >> >>> > > >> >> > > >> >> > > >> >> -- > > >> >> "Come to Me, all who are weary and heavy-laden, and I will > > >> >> give you rest. Take My yoke upon you and learn from Me, for > > >> >> I am gentle and humble in heart, and you will find rest for > > >> >> your souls. For My yoke is easy and My burden is light." > > >> >> > > >> >> ~ Matthew 11:28-30 > > >> > > >> > > >