Gerfried Fuchs <[EMAIL PROTECTED]> writes: > * Mario Lang <[EMAIL PROTECTED]> [2004-04-04 16:07]: >> Gerfried Fuchs <[EMAIL PROTECTED]> writes: >>> * Mario Lang <[EMAIL PROTECTED]> [2004-04-04 13:19]: >>>> AIUI, the description tag is not supposed to contain ordinary HTML markup >>>> in RSS 1.0. >>> >>> Thats why they are escaped and put in there as entities. >> >> But then, you are simply hoping for something to interpret this mess. >> If an aggregator does not, the resulting description text does simply >> look ugly and is hard to read. > > Have you taken a link at *any* of the feeds that are used on > <http://planet.debian.net/>? They do *all* include escaped HTML tags in > them.
I admit I don't use planet.debian.net at all, but the comparison is not as striking for me as it appears to be for you. Most feeds I use do not use escaped html inside description tags. In fact, Debian Security (long) is the first one I encountered that does do that. (I admit I didn't used much more then the W3C's and the RFC feed up until now. Slashdot and LWN not having a <description> field at all do not count here.) > And I think it is a good thing. I'd rather prefer if the goal (which indeed is a good thing) were implemented in a correct way. Given that RSS is already XML, and there are ways to define RSS extension modules, I simply think there is no need to bloat the existing <description> field with possibly unreadable cruft. >>> No, please not. From what I understand it HTML is allowed in there if >>> it is encoded as entities. >> >> I continue quoting from the same page: >> >> " If you need to include a a tag in the text of the feed (e.g., >> the title of an item is "Ode to <title>"), make sure you escape >> ampersands and angle brackets (so that it would be "Ode to >> <title>")." > > And this isn't done. Those tags _are_ escaped, thank you. The point is that we are talking about special characters like <, >, " and the like, these need to be escaped in case they are part of the description text so that XML parsing doesn't break. This does not mean that they should be used to embed other markup languages. You critiqued the fact that my patch left this in. I was simply observing that this is not a problem but a necessarity. >> However, this is not saying "Use ordinary html markup to identify links >> and paragraphs". > > And it doesn't say the contrary, like you insist. Actually, if you had read both quotes I pasted, it does say that. Granted, it does not forbid it explicitly either, but I guess what we are talking here is common sense. Of course you can embed all kinds of markup into the description by using <img src="http://traffic.net"> and the-like, but this is in contrary to what the description field was originally ment for. The document I quoted simply tries to emphasis this, and calls on people to avoid it, to prevent the mess from growing. >> The problem is that some aggregators might be able to parse escaped HTML >> markup, but it is simply not specified in the RSS standard, and so, >> aggregators >> are not required too. > > Maybe another plaintext feed helps, then. But I am still not convinced > that this is something that rss wasn't meant to offer, sorry. I don't think that a separate plaintext feed is the correct way to go about this. I am still convinced that the original RSS 1.0 <description> tag was ment to be used for plain text only. I always thought that Debian should set an example when it comes to following sensible guidelines when implementing new technology. Nevertheless, I've meanwhile found a way to convince my aggregator to "strip" this unnecessary markup (for reference, in Gnus summary buffer, use `W h' to "wash html"[1].) However, I still think that the current implementation is the wrong way to go, since it requires perfectly standard compliant RSS 1.0 clients to contain either a complete escaped html paser, or strip common html tags, to be able to present the actual information in a meaningful way to the user, which looks like a completely broken approach to me. I'd like to leave this bug open for further discussion on this matter if you don't mind. After all, it is just wishlist. [1] Something which is also used when one gets a HTML formatted e-mail without proper MIME type information. This strikes me as a very nice cmparison to further illustrate my case. Such e-Mails are also broken in a sense, still, one could argue that clients only need to parse the content correctly. -- CYa, Mario | Debian Developer <URL:http://debian.org/> | Get my public key via finger [EMAIL PROTECTED] | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44
pgpcS5pYxFkBe.pgp
Description: PGP signature