On Sun, Feb 27, 2022 at 12:13 PM Han-Wen Nienhuys <hanw...@gmail.com> wrote:
> On Sun, Feb 27, 2022 at 10:39 AM Luca Fascione <l.fasci...@gmail.com> > wrote: > > is it true that if you double the source size you double the compilation > time? > > it should be, but we have rather complicated page breaking code that > is so hairy that nobody understands it fully. I'm not sure there is > NP-complete snake in the proverbial grass there. > Understood. As a use of both lilypond and LaTeX I have been idly wondering for years whether modern computers can afford to use global line/page breaking algorithms that would avoid some of the shortcomings of TeX's approach. A discussion for a different thread, of course. accessing data (eg. offsets): > * on first access, the callback gets executed. This is just evaluating > (<function> <grob-object>). > * on second access, the value is cached in an alist, and looking it up > is extremely cheap. > This is cool. :-) I don't know enough about this program to even begin to have a gut feeling, however I guess I'm thinking it seems there would be tons of these reads, and I'm hearing you say that in an eventual sense, all data access is an alist access. I don't know how alists are actually implemented under the hood, but they feel like they would be a linear scan with symbol-typed keys on the surface. So to pull out one float you're doing what 5-10 64bit compares (checking the keys) and just as many pointer jumps, right? (I'm thinking the alist is a list of symbol/value pairs in the implementation also). This cost strongly dominates the float dereference itself, and there is this question of how much extra stuff is happening around it (I guess in my mind I'm comparing it to element access in C++ which is one pointer (this), one offset for the member (compiled into an immediate), one load of this+offset (which the hardware helps you with)). I feel for the moment I can't provide any concrete insight into any of this, because I don't know the specifics enough. > Code following a simple pattern like this, once compiled, will largely be > dominated by the > > scripting language runtime overhead > > From the outside this may seem plausible, but I doubt your intuition here: > > * the callback tables are also alists. They are cheap (they aren't > sped up if you swap them for hash tables) > Not presuming to know your program better than you, but I'd just bring up that this is saying that your lists are short (likely length 20ish on average): the observation you report is that hashing so you can do a direct access into an array is not faster than several pointer-pointer comparisons and pointer chases. The hash you'd use here be something like FNV or so, it'll break even somewhere in the 10-20 comparisons, I'd expect. > * Scheme has no marshaling: objects are either direct (scheme -> C++ > is bit twiddling), or they are indirect (a tagged pair with a C++ > pointer in the cdr) > Half of that I expected (more specifically, for various reasons, a number of which not accurate, I expected the Scheme APIs to be similar to the TCL APIs, and there as well you just get handed straight what TCL has in hand, not marshalling involved). One thing I didn't know is that the client calls to extract the machine representation of the value would be super cheap. But still, if the guile compiler translates scheme values into native ones and is able to leave them there for "long" stretches of code in some cases, and our use case instead prevents that, it seems it could eventually add up. Again I do need to learn the source better before you give these thoughts any real weight. > IMO The real problem is that we don't have good tooling to see what is > going on in the Scheme side of things: C++ has decent perf analysis, > but the Guile side of things just looks like a lot of time spent in > scm_eval(). Some of it is overhead, some if it might be a poorly > implemented Scheme function (which is often easier to fix than > reducing overhead.) > Very agreed that poorly conceived code is the first thing to address, no doubt. I'd think that the way to gain insight as to what's going on is to inspect the bytecode actually, and gain familiarity with the APIs that execute it. Is it that the bytecode is then translated to executable, or is it running on a VM? I would assume they don't provide a decompiler of any sort, do they? Thanks for a most interesting discussion L