Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread William Ballard
On Thu, Jan 13, 2005 at 10:06:25PM +0100, Frank S. Thomas wrote: > Using Python's DOM is (as far as I can see) the easiest and robust way to > accomplish the task. It is working but slow if it processes a file with more > than 8000 package records. But this was never my goal, I only wanted to >

Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread Frank S. Thomas
On Thursday 13 January 2005 20:42, William Ballard wrote: > On Thu, Jan 13, 2005 at 08:14:31PM +0100, Frank S. Thomas wrote: > > I could successfully convert sid's binary-i386/source package files into > > XML. However, because dctrl2xml builds up a dom tree of all packages in > > these files, this

Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread William Ballard
On Thu, Jan 13, 2005 at 08:14:31PM +0100, Frank S. Thomas wrote: > I could successfully convert sid's binary-i386/source package files into XML. > However, because dctrl2xml builds up a dom tree of all packages in these > files, this is horribly slow. There may also be some cases where dctrl2xml

Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread Frank S. Thomas
On Thursday 06 January 2005 02:14, Sven Mueller wrote: > I would appreciate it if you could make your scripts available once they > are (nearly;-)) finished. Ok, I nearly finished it :) It is written in Python and is called "dctrl2xml". Binary and source packages are apt-getable from my private r

Re: Output of dpkg-scanpackages as XML

2005-01-07 Thread William Ballard
On Fri, Jan 07, 2005 at 05:36:12PM +0200, Antti-Juhani Kaijanaho wrote: > Not that I'm not flattered by the fact that you use grep-dctrl, but ... > Is it really your intent here to filter out those packages whose > packages record does not contain a literal dot? Sounds like a quite > puzzling requ

Re: Output of dpkg-scanpackages as XML

2005-01-07 Thread Antti-Juhani Kaijanaho
On 20050105T163207-0500, William Ballard wrote: > echo '' > zcat /a/dists/latest/binary-i386/Packages.gz | \ > grep-dctrl . | sed -r > -e 's/(Description): > (.+)/<\1>\2<\/Short-Description><\/Long-Description><\/entry>/' | \ > head -n-1 > echo '' Not that I'm not flattered by the fact th

Re: Output of dpkg-scanpackages as XML

2005-01-06 Thread Sam Watkins
On Wed, Jan 05, 2005 at 11:24:46PM +, David Given wrote: > It's still an ad-hoc solution, though; does anyone know of versions of > the standard textutils that know about Unicode? the Plan 9 ones would use utf-8, but I suppose they're not POSIX. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Frank S. Thomas
On Thursday 06 January 2005 02:14, Sven Mueller wrote: > Frank S. Thomas wrote on 06/01/2005 01:46: > > Thanks so far. I think I'll write my own PHP script that will output a > > more structured XML document, so that I can create hyperlinks from the > > packages in the 'Depends' field and parse the

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Sven Mueller
Frank S. Thomas wrote on 06/01/2005 01:46: Thanks so far. I think I'll write my own PHP script that will output a more structured XML document, so that I can create hyperlinks from the packages in the 'Depends' field and parse the quasi field 'Homepage'. I would appreciate it if you could make yo

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Sven Mueller
Frank S. Thomas wrote on 06/01/2005 01:46: Thanks so far. I think I'll write my own PHP script that will output a more structured XML document, so that I can create hyperlinks from the packages in the 'Depends' field and parse the quasi field 'Homepage'. I would appreciate it if you could make yo

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Frank S. Thomas
On Wednesday 05 January 2005 22:21, William Ballard wrote: > This is trivial with grep-dctrl and sed. For example: Thanks, for making me aware of grep-dctrl. > echo '' > zcat /a/dists/latest/binary-i386/Packages.gz | \ > grep-dctrl -sPackage,Version . | \ > sed -r -e 's/([^:]+):(.+)/<\1>\2<

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 07:11:06PM -0500, William Ballard wrote: > You can leave out the element. Forgot to mention parsers are not obligated to respect whitespace and newlines unless it's in a tag, though they usually do (and they have flags to control this). It's like the tag in HTML in t

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Thu, Jan 06, 2005 at 12:28:47AM +0100, Sven Mueller wrote: >\2<\/CDATA> The only thing it can't contain is "]]>" You didn't use an actual CDATA node you used an element named CDATA. You can leave out the element. I forgot Depends/Recommends/Suggests/Conflicts lines will contain > and < cha

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Sven Mueller
William Ballard wrote on 05/01/2005 22:42: On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote: echo '' ^^^ Should have closed the CDATA tag here. The short description tag should probably be wrapped in CDATA too. If any package descriptions contain "]]>", it'll break it. I

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread David Given
William Ballard wrote: [...] But back to Linux. $echo hi | iconv -f utf8 -t unicode | grep hi (no output) Not surprised; grep understands ASCII, AFAIK, so what you've just sent to it is: $ echo hi | iconv -f utf8 -t unicode | od -t x1 000 ff fe 68 00 69 00 0a 00 It can't find an 'h' and an 'i'

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 10:36:53PM +, David Given wrote: > iconv is your friend: > > zcat Packages.gz | iconv -f utf8 -t ucs2-le | cscript In our case we're using sed. Is sed unicode-aware? (As an aside a lot of the commands you use in NT are builtins to cmd.exe and under this switch they

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread David Given
William Ballard wrote: [...] Of course you're right. But building XML with shell commands was always a lot easier when I could count on all shell output being 2-byte Unicode. It was a neat bit of magic, ascii and utf-8 text files would get turned into Unicode and I'd pipe them to cscript.exe and

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 04:44:45PM -0500, Justin Pryzby wrote: > On Wed, Jan 05, 2005 at 04:42:32PM -0500, William Ballard wrote: > > Is there a unicode shell which does all piping in Unicode? > > cmd.exe in NT has a switch that does all piping in Unicode > Does it make a difference? Shouldn't a p

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Justin Pryzby
On Wed, Jan 05, 2005 at 04:42:32PM -0500, William Ballard wrote: > On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote: > > echo '' > ^^^ > > Should have closed the CDATA tag here. The short description > tag should probably be wrapped in CDATA too. If any package > descri

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote: > echo '' ^^^ Should have closed the CDATA tag here. The short description tag should probably be wrapped in CDATA too. If any package descriptions contain "]]>", it'll break it. You should probably wrap maintainer name

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 04:21:38PM -0500, William Ballard wrote: > If you want to include the short or long descriptions you'd have to > wrap those fields in CDATA tags, so you'd need an exta sed > expression to handle that. This outputs all fields and splits the short and long descriptions: echo

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 09:57:26PM +0100, Frank S. Thomas wrote: > Hi, > > I want to publish on my homepage a list of packages, that are in my private > package repository. Therefore it would be useful, if I could convert the > output of 'dpkg-scanpackages' and 'dpkg-scansources' into XML, so th

Output of dpkg-scanpackages as XML

2005-01-05 Thread Frank S. Thomas
Hi, I want to publish on my homepage a list of packages, that are in my private package repository. Therefore it would be useful, if I could convert the output of 'dpkg-scanpackages' and 'dpkg-scansources' into XML, so that I only have to write the appropriate XSLT stylesheets. Is there any to