On Tuesday, June 30, 2020 at 7:48:14 AM UTC+8 Neil Van Dyke wrote:
> Is even 2x speedup helpful for your purpose? Yes it is, and for my purpose `read-xml` is fine even without any speed improvement. In the sports field, XML (via the TCX format) is a legacy technology. Typical TCX files are about 1Mb in size, the 14Mb one is a very large one. Setting ` xml-count-bytes` to #t while calling `read-xml` gets me a speed improvement at a low effort, but it is not worth adding another package dependency just to support a legacy technology. 3 seconds is one old magic > number for user patience in HCI, so I suppose there's still a big > difference between 4 seconds and almost 10 seconds? > I am not sure where you got the 3 seconds from, but even 3 seconds is too long to wait on a button callback. For large files, both read-xml and sxml would need to have a progress dialog with a cancel button, or some other form of user feedback, if one wants to make a "well behaved" GUI. > For large (and absolutely massive) XML... SSAX can shine even better > than in this comparison, since you can, say, populate a database *while > you're parsing, without first constructing the intermediate > representation* of xexpr or SXML. GC-wise, with the database-populating > scenario, you'll probably end up with small, little-referencing, local, > short-lived allocations. Besides GC costs, you'll also use less RAM > (possibly lower AWS bill), and be less likely to push into swap (which > would be bad for performance). > ... if you are willing to deal with the complexity of a SAX interface, that is. I have written code for parsing documents (correctly!) using a SAX interface, and the resulting code was so complex that I had to use a code generator for it, but yes, the resulting code was very fast. Would I do it again? No. The complexity of SAX parsing is probably why most people use a DOM style interface... > In addition to SSAX's current performance characteristics and > opportunities... There might also be opportunity to optimize SSAX > significantly for Racket. Oleg is a famously capable Scheme programmer, > but he was writing SSAX in fairly portable Scheme code, a couple decades > ago, when he wrote SSAX. I did an initial packaging of SSAX for PLT > Scheme, Kirill Lisovsky later did many packagings of various SXML-ish > tools (including his own), and then John Clements did more work to > package Oleg's SXML-ish tools for Racket... But I don't know that anyone > has had motivation to try to optimize Racket's SSAX port, using current > Racket features, and tuning for current performance characteristics. > > Side note regarding performance comparison... FWIW, SSAX might be doing > some things `read-xml` doesn't, such as namespace resolution, entity > reference resolution, and some validation. > You used the phrase "might be doing...", does that mean that it might not do those things? Alex. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/affcfe0e-a5a7-43a6-9019-8876dc40ed03n%40googlegroups.com.