Konstantin, Thanks so much for explaining the strengths and weaknesses of each approach.
I've implemented the <h3> parsing which is working fine at the moment, but I share your concerns about it. What wasn't clear to me was -- is there any downside to the RDF-based approach? Would that be better in every way? On Fri, Sep 5, 2014 at 7:18 PM, Konstantin Kolinko <knst.koli...@gmail.com> wrote: > 2014-09-04 22:46 GMT+04:00 Daniel Mikusa <dmik...@pivotal.io>: >> On Thu, Sep 4, 2014 at 1:48 PM, David P. Caldwell < >> da...@code.davidpcaldwell.com> wrote: >> >>> I have a small program that downloads and installs an arbitrary >>> version of Tomcat, using the API provided by Apache to select the >>> proper mirror, and so forth. >>> >>> The script currently takes the Tomcat version as an argument. My >>> script provides a default (which in my case is the latest version of >>> Tomcat 7), but I have to manually update that default whenever I >>> notice a new version has been released. >>> >>> What would be the best way for the script itself to determine the >>> latest available version? Obviously I would give points for "easy" and >>> points for "robust," knowing that those two things might be in >>> conflict. >>> >>> I can think of many horrifying ways to do it: >>> >>> * loop through integers starting with the last known version, >>> attempting to download 7.0.x, until getting a 404 >>> * scraping and parsing the HTML at >>> http://archive.apache.org/dist/tomcat/tomcat-7/, which I expect is >>> rather stable >>> >> >> I did this recently for Tomcat 8. Here's the command I used, which works >> on my Mac. >> >> LATEST_VERSION=$(curl -s http://tomcat.apache.org/download-80.cgi | grep >> "<h3 id=\"8.0." | xpath '/h3/text()' 2>/dev/null) >> >> A slight variation works on Ubuntu if you install xpath. >> >> LATEST_VERSION=$(curl -s http://tomcat.apache.org/download-80.cgi | grep >> "<h3 id=\"8.0." | xpath -e '/h3/text()' 2>/dev/null) >> >> I'm sure there are other ways to do it, this was just the first one I put >> together that worked for me. >> > > There also exist the following XML file.and the following page > http://tomcat.apache.org/doap_Tomcat.rdf > http://tomcat.apache.org/whichversion.html > >>> So my challenge isn't coming up with *a* way to do it, but coming up >>> with the best way. > > I would say that download-nn.cgi is the most reliable from the above > ones. A version cannot be released without updating the download page. > > But there are the following concerns: > - server generates the page dynamically, > (but as a bonus it gives you a list of best mirrors for your IP address) > - stability of markup. > (I would prefer to parse a download link, instead of <h3> header tag) > > If those are of concern, then the doap_Tomcat.rdf XML file would be a > better source of information. > > > Regarding parsing the download page of a mirror (and > archive,apache.org is one of those mirrors): > - An announcement and tomcat.a.o site update are usually postponed by > a day to let the mirrors sync. > > If you parse the page of a mirror, you may get version number that > have already been released to mirrors, but have not yet announced. The > version may be absent from other mirrors. > > - If you parse the page of a mirror and there are several Tomcat > versions available (e.g. N and N-1), and your user chooses version > "N-1". It allows you to download the version "N-1" from the mirror > instead of archive.a.o site. > > By the way, do you verify md5 hash or PGP signature of the files > downloaded from mirrors? > >> * loop through integers starting with the last known version, >> attempting to download 7.0.x, until getting a 404 > > There have been several reports of mirrors that did not respond with > proper 404, but instead produced a redirection to some advertisement > page. Such behaviour is against ASF mirror policies, and those mirrors > have been unlisted y ASF infrastructure team, but it may happen again. > > Sometimes a MiTM responds with a redirect (e.g. a mobile operator may > do so when there are some problems with your account). > > > Best regards, > Konstantin Kolinko > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org