[can we start putting Keywords: headers in longer reports, for the archive server? (it is smartlist, right?)]
(If you use Gnus, press {W e} now.) ---------------------------------------------------------------------- *Boa may not be our best option.* _Gzipped HTML files:_ Christian> One questions remains: Is it possible to browse Christian> "html.gz" files _without_ a CGI script with the usual Christian> HTML browsers (Netscape, lynx)? If so, we'll make it Christian> policy to gzip all html files and to adopt the Christian> references. If not, we'll have to install all html Christian> files gezipped--or add a cgi capable web server to the Christian> base system. With `boa', none of the links will need to be rewritten. They can point at ".../file.html", and the `boa' server will look for that. If it cannot find it, it tries again with a .gz extension tacked on, and sends that if it's found. *`apache' can be made to do this search/rewrite also.* It can be done using the mod_rewrite, which I am reading about now. The browser's ability to view compressed html is dependant on correct entries in "/etc/mailcap". They all use that to find the program they need to run to handle a particular MIME type. from "/etc/mailcap":
# ----- User Section Begins ----- # application/x-gzip; /bin/gunzip -c %s; test=true; description=GNU zip; nametemplate=%s.gz application/x-compress; /bin/gunzip -c %s; test=true; description=UNIX compress; nametemplate=%s.Z # ----- User Section Ends ----- #
This works fine with apache, (running from xinetd), but *NOT* with boa. With the identical mailcap, and the same file served from `boa', I get an error dialog. (Why?) I tested with both Netscape 3 and W3. For some reason, serving gzipped files from `boa' (Version: 0.92-5) does not work, given mailcap entries that function fine with apache served documents. Can anyone verify this? If we want to uncompress things /before/ transmission, it would be simple for `dwww' or any other doc serving CGI engine to run things through `gunzip', prior to sending the page to the browser. The better way to do this though might be to use the mod_actions in Apache. If you grab the apache-dev from "bo/source/web", and have a look in the Configuration file, you'll find:
## The asis module implemented ".asis" file types, which allow the embedding ## of HTTP headers at the beginning of the document. mod_imap handles internal ## imagemaps (no more cgi-bin/imagemap/!). mod_actions is used to specify ## CGI scripts which act as "handlers" for particular files, for example to ## automatically convert every GIF to another file type. # Module asis_module mod_asis.o # Module imap_module mod_imap.o Module action_module mod_actions.o
... so it looks like it won't be too difficult to set up a perl script that can turn .html.gz files into .html output to the browser as text/html. So it won't have to send application/x-gzip. I don't think that's the best solution, since sending gzipped data will save bandwidth. *Now we need to figure out how to make it check for a file with a .gz when one without was not found, the way `boa' does.* I am reading about mod_rewrite; it looks like exactly what is needed. It could solve the doc teams problem with URL's in documentation, also, since it can perform tests for the existance of a file. (this must have been discussed... I should look in the archive.) There can be .htaccess files in the /usr/doc directories. _Running the server from `inetd' or, `xinetd'._ * `boa' /cannot/ be run from `inetd' or `xinetd' in its present incarnation.* It needs to have a configuration option added, like `apache' has, so that it will exit after serving a series of requests when it is launched from inetd. Right now, it must run as a daemon. I don't think there is a way to ask it not to answer requests from outside an authorized realm, either, another advantage to running the httpd with inetd. *Apache can be run in `inetd' mode*, by setting an option in the configuration file, and adding an entry to "/etc/inetd.conf" or "/etc/xinted.conf". With the tcpwrapper or `xinetd', it is possible to dissallow access to the server from outside your domain. If you are concerned about that, and many users will be, then *use xinetd* and set the 'only-from' option for the WWW service. *It works very well.* Alternatively, use the tcpwrapper. Here's an example of the logging produced, given the following setup in "/etc/xinetd.conf".
Jun 29 00:51:45 bittersweet xinetd[252]: START: www pid=25479 from=206.129.216.38 Jun 29 00:51:45 bittersweet xinetd[252]: START: ident pid=25480 from=206.129.216.38 Jun 29 00:51:45 bittersweet xinetd[25479]: USERID: www UNIX : karlheg Jun 29 00:51:45 bittersweet xinetd[252]: EXIT: ident status=0 pid=25480 duration=0(sec) Jun 29 00:51:46 bittersweet xinetd[252]: EXIT: www status=0 pid=25479 duration=1(sec)
service www { socket_type = stream protocol = tcp wait = no instances = 8 user = www-data flags = IDONLY only_from = 127.0.0.1 206.129.216.38 206.129.216.1 log_type = SYSLOG daemon log_on_success = PID HOST USERID EXIT DURATION log_on_failure = HOST USERID server = /usr/sbin/apache }
Perhaps we can put together a minimal Apache for the base set, similar to how a minimal Perl is provided for it? It could be replaced by the full blown package at the installer's option. The minimal version could have the modules it needs staticly compiled into it, or as modules, the way it's configured now. Fernando> A web server adds a lot of flexibility. Boa adds very Fernando> little overhead. Try it and only then say something. Try running Apache from `inetd' and see what you think of that, also. I set `top' to redisplay every second, and then hit: http://localhost/doc/ ... which is a very large directory. Apache starts right up, even configured with dynamicly loading modules, and goes away again right after it's done, freeing all of the resources it borrowed. The disk didn't seem to get hit any more than with `boa', from a purely subjective standpoint. (I have lots of RAM. On a low-RAM system, it would maybe page some.) The size of the processes are:
m USER PID %CPU %MEM VSZ RSS TT STAT START TIME _ root_ 252 0.0 0.6 1040 420 ? S Jun_22 0:00 /usr/sbin/xinetd _ www-data 25056 0.0 1.1 1232 756 ? R 23:54 0:00 \_ apache
m USER PID %CPU %MEM VSZ RSS TT STAT START TIME _ www-data 25738 0.0 0.7 908 456 p9 S 01:54 0:00 /usr/sbin/boa
... here you can see that `boa' really isn't /that/ much smaller than `apache', and must run all the time. On a system with only 4Mb of RAM, it will need an extra bit of swap space, and grind just a bit. No big deal. And when the page is served, it frees it up again. Apache has several very nice features that make it our best choice. It works. gzipped files served by apache are displayed by the browsers I tested. You can put a HEADER.html and README.html file in a directory, and apache will put those at the top and bottom of a server generated index. It deals with content negotiation and languages other than english. (does `boa' do that? I only speak english.) It is the defacto industry standard server software. Apache is very popular. The commercial `StrongHold' SSL server is based upon it. It is very flexible, modular, extensible, and quite configurable. It can do the URL rewriting (like how sendmail does address rewriting) that will be required to link all of the SPI documentation. Fernando> In all cases mentioned, the overhead of on-the-fly Fernando> conversion is acceptable enough, just a little slower Fernando> than formatting man pages. And cached, like manual pages and `dwww' pages, with a cron job to remove old ones. The caching behaviour should be optional, for low disk space setups. Fernando> 3) Documents in markup format for which no on-the-fly Fernando> conversion is available will be included in both Fernando> pre-processed HTML and original format. This is a last Fernando> resort measure. Agreed. Fernando> Original format should always be included. Reasons: 1) Fernando> To produce printed copies. 2) Because I hate Fernando> Ghostscript and Xdvi. I prefer reading the markup Fernando> directly (and I am not alone.) 3) Because users might Fernando> want to process the documents automatically (search Fernando> engines for example) HTML should be included so that Fernando> the documents are cleanly integrated with the rest of Fernando> the documentation and for serving them to remote (W95) Fernando> systems if necessary via http. `gv' and alladin ghostscript rules! (And DEC SRC's Virtual Paper, if you own a Zip drive or have plenty of hard disk.) Fernando> 4) Documents originally in binary format (PS, DVI, PDF, Fernando> MS-WORD) for which no conversion is possible should be Fernando> packaged separately. A file explaining how to get the Fernando> documentation (including which programs the user will Fernando> need: ghostscript, xdvi, MS Word) and a brief summary of Fernando> the document should be included in the binary package in Fernando> HTML format, or a convertible one. Ok with me... Fernando> Binary documents are useful mainly for printing and Fernando> they usually have a huge size to information ratio. I Fernando> hate storing junk in my systems. Online viewing of Fernando> binary documents is awful. I hate downloading a 1MB Fernando> file just to find out it does not answer my Fernando> questions. Since developers must read the document Fernando> anyway, they could make a brief summary. Sometimes that Fernando> would be enough to decide whether it is worth Fernando> downloading the full document or not. Binary docs can Fernando> not be integrated with the rest, they should be Fernando> discouraged. Authors should be encouraged to give out Fernando> the document in the original format in which they wrote Fernando> it, which seldom is a binary one, except for MS Word. So we should package large doc files in separate -doc packages. Ok. That way, if I want to know what-in-the-heck-is-sendmail, I can grab the doc package for it, and read that, without installing the program itself. I think that's the Right Way to do things. ---Not just large `binary' docs, but all docs, *especially for major programs.* Maybe it would be good to be able to extract just the conffiles from a .deb too, for having a look at them prior to installation... a suggested and documented way. (how to get started learning Linux) Fernando> 1) The default format for online documentation is Fernando> HTML. A web browser (lynx) and a very small web server Fernando> (boa) will be in the core distribution, marked Fernando> important. [...] Fernando> 5) The man program should be marked optional. When a Fernando> user types "man something" and man is not installed, Fernando> lynx would automatically be invoked and it would present Fernando> the HTML-converted man page the user requested. Perhaps the standalone `info' reader could also launch `lynx', in the similar way as it can grab regular man pages now? (or should. The one that's with RedHat 3.0.3 seems to work, my copy does not. On the redhat machine at my ISP, I can type `info man' and get the man man.) I suppose what everyone may say is that lynx can hit `dwww', which will also _display_ info files. Display is about all though. It removes much of the functionality of Info. The *best* way to read info and manuals is inside XEmacs. :-) Try it long enough to really find out what it's capable of, and only then say something. It has a web browser inside it too, and works just fine on a tty. You don't /have/ to have X to use it. (if it's compiled right. (AFAIK.) It cannot be linked against X libs if they aren't installed.) Emacs is great as well, for machines without the resources to run XEmacs, wich needs at the very least 12Mb RAM, under X-Windows, which needs a minimum of 8Mb to run well. Fernando> This is almost as fast as original man but much more Fernando> powerful. As it has little overhead, it can replace the Fernando> man program. But if a user still thinks he can't stand Fernando> the small overhead, man can optionally be installed. Fernando> Man depends on groff. That's a huge overhead in both Fernando> size and speed. Nowadays no one writes groff documents Fernando> other than man pages. However, groff is needed for Fernando> printing man pages, but it is a bloated solution for Fernando> online consultation. In Info, Emacs or XEmacs, you type {M-x man}, and enter a man page. It runs the man command, and displays the formatted result. `man' can be used like that, as can `lynx'. You can follow links to other man pages with the mouse--- they highlight when you fly over them. (I like info the best. W3 complements it quite nicely.) Fernando> 6) The info program should be marked optional. When it Fernando> is installed, it would compile the texinfo files and Fernando> place the output in /usr/info. When it is deinstalled, Fernando> it would erase the /usr/info directory. There will be a Fernando> hook in dwww to register texinfo pages so that the info Fernando> directory is kept always current. No! You cannot just erase that directory if Emacs or XEmacs are installed. I suggest looking over info once again, and finding out more about it's design and contents. Try `libc-mode' in emacs, and info lookup in cperl mode. They are invaluable. Fernando> The preferred online way of viewing texinfo files is Fernando> through the texinfo to HTML on-the-fly converter. Info Fernando> fans who prefer the crappy info interface should still Fernando> be able to install info files, but without imposing Fernando> them on everyone. The info format is awful. Texinfo is Fernando> nicer. Texinfo->HTML is optimal. Emacs fans can use the Fernando> w3 mode for viewing texinfo files. Or they can install Fernando> info and use the info mode if they want. For other Fernando> people, just texinfo is enough. You don't know what you're talking about, IMO. I suggest you read the help and tutorial for using `info'. The interface is *not* "crappy". It is very powerful. Sometimes the keys seem a little odd; they work on any keyboard ever invented though. It is much more powerful than lynx or any web browser viewing html. I'll bet that info is less resource-intensive as well, being preformatted. Preformatted, searchable, hyperlinked tree structure, programmable (inside emacs), texinfo can be printed or made into info files... TeXinfo was designed by men who have studied and used computers for a long time. While I'm browsing C or perl source, I can press a few keys, and have the info manual to a libc or perl function in a second window in under a second. (I can put the cursor on a word, and with a few keystrokes, have W3 fetch its definition from the online dictionary, too.) Fernando> 7) A default searching/indexing engine should be Fernando> chosen. It would be marked standard, but not Fernando> important. Caching would be an option too. Yes. Jim Pick's `dwww' is the best one going. There's no reason why info files cannot be indexed also. With `gnuclient', info URI's could be opened in an emacs if that is the user's preference. (Koalatalk someday. ;-) ) -- mailto:[EMAIL PROTECTED] (Karl M. Hegbloom) http://www.inetarena.com/~karlheg Portland, OR USA Debian GNU 1.3 Linux 2.1.36 AMD K5 PR-133