Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Alex Harsanyi Mon, 29 Jun 2020 18:24:59 -0700


On Tuesday, June 30, 2020 at 7:48:14 AM UTC+8 Neil Van Dyke wrote:


> Is even 2x speedup helpful for your purpose? 


Yes it is, and for my purpose `read-xml` is fine even without any speed 
improvement.  In the sports field, XML (via the TCX format) is a legacy 
technology.  Typical TCX files are about 1Mb in size, the 14Mb one is a 
very large one.   Setting ` xml-count-bytes` to #t while calling `read-xml` 
gets me a speed improvement at a low effort, but it is not worth adding 
another package dependency just to support a legacy technology.

3 seconds is one old magic 
> number for user patience in HCI, so I suppose there's still a big 
> difference between 4 seconds and almost 10 seconds? 
>

I am not sure where you got the 3 seconds from, but even 3 seconds is too 
long to wait on a button callback.  For large files, both read-xml and sxml 
would need to have a progress dialog with a cancel button, or some other 
form of user feedback, if one wants to make a "well behaved" GUI.
 

> For large (and absolutely massive) XML... SSAX can shine even better 
> than in this comparison, since you can, say, populate a database *while 
> you're parsing, without first constructing the intermediate 
> representation* of xexpr or SXML.  GC-wise, with the database-populating 
> scenario, you'll probably end up with small, little-referencing, local, 
> short-lived allocations.  Besides GC costs, you'll also use less RAM 
> (possibly lower AWS bill), and be less likely to push into swap (which 
> would be bad for performance). 
>

... if you are willing to deal with the complexity of a SAX interface, that 
is.  I have written code for parsing documents (correctly!) using a SAX 
interface, and the resulting code was so complex that I had to use a code 
generator for it, but yes, the resulting code was very fast.   Would I do 
it again? No.

The complexity of SAX parsing is probably why most people use a DOM style 
interface...
 

> In addition to SSAX's current performance characteristics and 
> opportunities... There might also be opportunity to optimize SSAX 
> significantly for Racket. Oleg is a famously capable Scheme programmer, 
> but he was writing SSAX in fairly portable Scheme code, a couple decades 
> ago, when he wrote SSAX.  I did an initial packaging of SSAX for PLT 
> Scheme, Kirill Lisovsky later did many packagings of various SXML-ish 
> tools (including his own), and then John Clements did more work to 
> package Oleg's SXML-ish tools for Racket... But I don't know that anyone 
> has had motivation to try to optimize Racket's SSAX port, using current 
> Racket features, and tuning for current performance characteristics. 
>

> Side note regarding performance comparison... FWIW, SSAX might be doing 
> some things `read-xml` doesn't, such as namespace resolution, entity 
> reference resolution, and some validation. 
>

You used the phrase "might be doing...", does that mean that it might not 
do those things?

Alex.

 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/affcfe0e-a5a7-43a6-9019-8876dc40ed03n%40googlegroups.com.

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Reply via email to