Re: [users@httpd] Port-based vhosts

Charles Sprickman Wed, 11 Mar 2009 22:13:37 -0700

On Wed, 11 Mar 2009, Andr? Warnier wrote:

Charles Sprickman wrote:
[...]
 Under what
conditions does Apache then get involved and alter the URL? Justredirects? I understand a common redirect is just adding a trailing slashwhen the user does not supply it. What are some other common cases? Who'scall is it when a simple static site uses non-absolute URLs for all thelinks? Is the browser building the fully-qualified links or apache (Isuspect the former)?
If you suspect that it is the browser, you suspect correctly. But theexplanation is somewhat messy (and lengthy) unless you really understand thebasics. Let me try a not entirely correct but hopefully didacticexplanation.
Say the browser retrieves a first html page from a server, using the URL"http://server.company.com/mydir/mypage.html";. This URL, from which thebrowser retrieved the current page, is now for the browser the "base URL" ofthe currently displayed document.Now say that this page contains a relative link like <imgsrc="images/myface.gif" />.
If the user clicks on this link, the browser will construct a new URL by
- removing the last component of the base URL (in this case "mypage.html"),leaving "http://server.company.com/mydir/";- re-adding to that the relative link "images/myface.gif", giving"http://server.company.com/mydir/images/myface.gif";
- retrieving this new URL
Nothing of that happens at the server side. It's all done at the browserlevel, any browser.
In reality, what happens is a bit different, because in a URL like"http://server.company.com/mydir/mypage.html";, there are several parts whichare processed differently and independently, and a HTTP request is not reallyto "http://server.company.com/mydir/mypage.html";. The real HTTP requestsequence is more like this :
a) the browser opens a TCP connection to port 80 of the host which has the IPaddress corresponding to the DNS resolution of the hostname"server.company.com"
b) on that connection, the browser writes a HTTP request like
GET /mydir/mypage.html HTTP/1.1
Host: server.company.com
then it switches to read mode and waits for the server's response to arriveon that same connection.
So in my first explanation above, you have to leave out the "protocol" and"host:port" from the current page's base URL, but the general idea remains.
Now about the redirects, re-using the above logic.
(This is what is called "external redirects", see later).

b) the browser sends a request to the server, like
GET /mydir HTTP/1.1
Host: server.company.com

c) the server sends a response to the browser, like
301 (this thing has moved, definitely)
Location: /mydir/  (here is the new location)
d) now the browser, automatically, re-sends a new request on the sameconnection :
GET /mydir/ HTTP/1.1
Host: server.company.com

e) and, presumably, the server now responds with the requested content.
In addition, if the browser is smart, it will remember that the URL "/mydir"has moved to "/mydir/", and the next time it will request it directly, evenif the forgetful user would request "/mydir" again. It will also show the"/mydir/" in the URL bar for that page, because that is the real URL it gotthe page from (and in the vain hope of educating the user about the fact thatthe URL "/mydir" is the wrong one and should not be used anymore).
So, the penalty of using a 301 re-direct is that there is one more round-tripserver-browser-server (see c and d above). But it is a relatively small one,because the content is very short, and because nowadays with keep-aliveconnections the same TCP connection browser-server can be used for all of it.The benefit is that the browser has the correct idea of what the "base URL"is at all times, and thus that it can correctly interpret relative URLs andcompose the correct follow-up requests.
"Internal" redirects :
These are things that the server does internally, without telling the browserabout it. mod_rewrite allows you to internally modify a request URL beforethe rest of the server will make an attempt at finding and serving therequested resource. In that case thus, the browser sends a request like
GET /mydir HTTP/1.1
Host: server.company.com
and the server, internally, modifies this "/mydir" to "/anotherdir/", thenproceeds to immediately serve the content of "/anotherdir/", without sendinga redirect to the browser, and without telling the browser about anything.The browser gets a response :
200 OK
...
.. content of "/anotherdir/"
This is obviously faster, because you avoid a round-trip to the browser andback, through a potentially slow connection.
But now the browser does not know about the substitution, and genuinelybelieves that what it got was the content corresponding to the "/mydir" URL.So now if in this content it finds relative links like "images/myface.gif",it will interpret them relative to the base URL "/mydir", and that may causefurther problems.So by doing this, you may be saving one round-trip for the original "/mydir",but at best forcing subsequent round-trips for other links, at worstpotentially confusing the browser into requesting further invalid URLs.
Whether one or the other scenario is better in your case, depends on manyfactors, and you have to evaluate those yourself in function of your websiteand what is really going on there.

Wow. Thank you so much for the thorough explanation. I really appreciatethe time you and everyone else put into laying out how all this stuffinteracts.

Is there any chance you could put a version of the above somewhere in theapache wiki? Lots of stuff there is applicable to rewrites andServerName. It really fills in a ton of blanks in the core documentationsince it deals with the very basics of how the browser and server work outredirects.


Thanks again,

Charles






---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
 "   from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
  "   from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Re: [users@httpd] Port-based vhosts

Reply via email to