Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Neil Van Dyke Sun, 28 Jun 2020 12:56:30 -0700

If anyone wants to optimize `read-xml` for particular classes of use,without changing the interface, it might be very helpful to run yourrepresentative tests using the statistical profiler.

The profiler text report takes a little while of tracing throughmanually to get a feel for how to read and use it, but it can betremendously useful, and is worth learning to do if you need performance.

After a first pass with that, you might also want to look at how costlyallocations/GC are, and maybe do some controlled experiments aroundthat. For example, force a few GC cycles, run your workload underprofiler, check GC time during, and forced time after. If you'redealing with very large graphs coming out of the parser, I don't knowwhether those are enough to matter with the current GC mechanism, butmaybe also check GC time while you're holding onto large graphs, whenyou release them, and after they've been collected. At some point, GCgets hard for at least me to reason about, but some things make sense,and other things you decide when to stop digging. :) If you record allyour measurements, you can compare empirically the how different changesto the code affect things, hopefully in representative situations.

I went through a lot of these exercises to optimize a large system, andsped up dynamic Web page loads dramatically in the usual case (to thepoint we were then mainly limited by PostgreSQL query cost, not much bythe application code in Scheme, nor our request&response network I/O),and also greatly reduced the pain of intermittent request latency spikesdue to GC.

One of the hotspots, I did half a dozen very different implementations,including C extension, and found an old-school pure Schemeimplementation was fastest. I compared the performance of theimplementation using something like `shootout`, but there might bebetter ways now in Racket. https://www.neilvandyke.org/racket/shootout/ I also found we could be much faster if we made a change to what thealgorithm guarantees, since it was more of a consistency check thatturned out to be very expensive and very redundant, due to all the waysthat utility code ended up being used.

In addition to contrived experiments, I also rigged up a runtime optionso that the server would save data from the statistical profiler foreach request a Web server handled in production. Which was tremendouslyuseful, since it gave us real-world examples that were also difficult tosynthesize (e.g., complex dynamic queries), and we could go from Weblogs and user feedback, to exactly what happened.

(In that system I optimized, we used Oleg's SXML tools very heavilythroughout the system, plus some bespoke SXML tools for HTML and XML. There was one case in which someone had accidentally used the `xml`module, not knowing it was incompatible with the rest of the system,which caused some strange failures (no static checking) before it wasdiscovered, and we changed that code to use SXML.)


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org.

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Reply via email to