On Thu, Jan 13, 2005 at 10:06:25PM +0100, Frank S. Thomas wrote:
> Using Python's DOM is (as far as I can see) the easiest and robust way to
> accomplish the task. It is working but slow if it processes a file with more
> than 8000 package records. But this was never my goal, I only wanted to
>
On Thursday 13 January 2005 20:42, William Ballard wrote:
> On Thu, Jan 13, 2005 at 08:14:31PM +0100, Frank S. Thomas wrote:
> > I could successfully convert sid's binary-i386/source package files into
> > XML. However, because dctrl2xml builds up a dom tree of all packages in
> > these files, this
On Thu, Jan 13, 2005 at 08:14:31PM +0100, Frank S. Thomas wrote:
> I could successfully convert sid's binary-i386/source package files into XML.
> However, because dctrl2xml builds up a dom tree of all packages in these
> files, this is horribly slow. There may also be some cases where dctrl2xml
On Thursday 06 January 2005 02:14, Sven Mueller wrote:
> I would appreciate it if you could make your scripts available once they
> are (nearly;-)) finished.
Ok, I nearly finished it :) It is written in Python and is called "dctrl2xml".
Binary and source packages are apt-getable from my private r
On Fri, Jan 07, 2005 at 05:36:12PM +0200, Antti-Juhani Kaijanaho wrote:
> Not that I'm not flattered by the fact that you use grep-dctrl, but ...
> Is it really your intent here to filter out those packages whose
> packages record does not contain a literal dot? Sounds like a quite
> puzzling requ
On 20050105T163207-0500, William Ballard wrote:
> echo ''
> zcat /a/dists/latest/binary-i386/Packages.gz | \
> grep-dctrl . | sed -r
> -e 's/(Description):
> (.+)/<\1>\2<\/Short-Description><\/Long-Description><\/entry>/' | \
> head -n-1
> echo ''
Not that I'm not flattered by the fact th
On Wed, Jan 05, 2005 at 11:24:46PM +, David Given wrote:
> It's still an ad-hoc solution, though; does anyone know of versions of
> the standard textutils that know about Unicode?
the Plan 9 ones would use utf-8, but I suppose they're not POSIX.
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED
On Thursday 06 January 2005 02:14, Sven Mueller wrote:
> Frank S. Thomas wrote on 06/01/2005 01:46:
> > Thanks so far. I think I'll write my own PHP script that will output a
> > more structured XML document, so that I can create hyperlinks from the
> > packages in the 'Depends' field and parse the
Frank S. Thomas wrote on 06/01/2005 01:46:
Thanks so far. I think I'll write my own PHP script that will output a more
structured XML document, so that I can create hyperlinks from the packages in
the 'Depends' field and parse the quasi field 'Homepage'.
I would appreciate it if you could make yo
Frank S. Thomas wrote on 06/01/2005 01:46:
Thanks so far. I think I'll write my own PHP script that will output a more
structured XML document, so that I can create hyperlinks from the packages in
the 'Depends' field and parse the quasi field 'Homepage'.
I would appreciate it if you could make yo
On Wednesday 05 January 2005 22:21, William Ballard wrote:
> This is trivial with grep-dctrl and sed. For example:
Thanks, for making me aware of grep-dctrl.
> echo ''
> zcat /a/dists/latest/binary-i386/Packages.gz | \
> grep-dctrl -sPackage,Version . | \
> sed -r -e 's/([^:]+):(.+)/<\1>\2<
On Wed, Jan 05, 2005 at 07:11:06PM -0500, William Ballard wrote:
> You can leave out the element.
Forgot to mention parsers are not obligated to respect whitespace and
newlines unless it's in a tag, though they usually do (and
they have flags to control this). It's like the tag in HTML in
t
On Thu, Jan 06, 2005 at 12:28:47AM +0100, Sven Mueller wrote:
>\2<\/CDATA>
The only thing it can't contain is "]]>"
You didn't use an actual CDATA node you used an element named CDATA.
You can leave out the element.
I forgot Depends/Recommends/Suggests/Conflicts lines will contain > and
< cha
William Ballard wrote on 05/01/2005 22:42:
On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote:
echo ''
^^^
Should have closed the CDATA tag here. The short description
tag should probably be wrapped in CDATA too. If any package
descriptions contain "]]>", it'll break it.
I
William Ballard wrote:
[...]
But back to Linux.
$echo hi | iconv -f utf8 -t unicode | grep hi
(no output)
Not surprised; grep understands ASCII, AFAIK, so what you've just sent to
it is:
$ echo hi | iconv -f utf8 -t unicode | od -t x1
000 ff fe 68 00 69 00 0a 00
It can't find an 'h' and an 'i'
On Wed, Jan 05, 2005 at 10:36:53PM +, David Given wrote:
> iconv is your friend:
>
> zcat Packages.gz | iconv -f utf8 -t ucs2-le | cscript
In our case we're using sed. Is sed unicode-aware?
(As an aside a lot of the commands you use in NT are builtins to cmd.exe
and under this switch they
William Ballard wrote:
[...]
Of course you're right. But building XML with shell commands
was always a lot easier when I could count on all shell output
being 2-byte Unicode. It was a neat bit of magic, ascii and
utf-8 text files would get turned into Unicode and I'd pipe
them to cscript.exe and
On Wed, Jan 05, 2005 at 04:44:45PM -0500, Justin Pryzby wrote:
> On Wed, Jan 05, 2005 at 04:42:32PM -0500, William Ballard wrote:
> > Is there a unicode shell which does all piping in Unicode?
> > cmd.exe in NT has a switch that does all piping in Unicode
> Does it make a difference? Shouldn't a p
On Wed, Jan 05, 2005 at 04:42:32PM -0500, William Ballard wrote:
> On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote:
> > echo ''
> ^^^
>
> Should have closed the CDATA tag here. The short description
> tag should probably be wrapped in CDATA too. If any package
> descri
On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote:
> echo ''
^^^
Should have closed the CDATA tag here. The short description
tag should probably be wrapped in CDATA too. If any package
descriptions contain "]]>", it'll break it.
You should probably wrap maintainer name
On Wed, Jan 05, 2005 at 04:21:38PM -0500, William Ballard wrote:
> If you want to include the short or long descriptions you'd have to
> wrap those fields in CDATA tags, so you'd need an exta sed
> expression to handle that.
This outputs all fields and splits the short and long descriptions:
echo
On Wed, Jan 05, 2005 at 09:57:26PM +0100, Frank S. Thomas wrote:
> Hi,
>
> I want to publish on my homepage a list of packages, that are in my private
> package repository. Therefore it would be useful, if I could convert the
> output of 'dpkg-scanpackages' and 'dpkg-scansources' into XML, so th
Hi,
I want to publish on my homepage a list of packages, that are in my private
package repository. Therefore it would be useful, if I could convert the
output of 'dpkg-scanpackages' and 'dpkg-scansources' into XML, so that I only
have to write the appropriate XSLT stylesheets.
Is there any to
23 matches
Mail list logo