I'm aware that something similar can be achieved by different approaches. But, note how these IMs are a nice compromise between two extremes: 1. doing no inlining at all, leaving all function calls intact or 2. inlining absolutely everything.
Of course such an IM based scheme could be supplemented by run-time invocation statistics to make better choices of what to instantiate and compile---as is being done in several interpreters/compilers. Den 9 jan 2018 15:30 skrev "Mikael Djurfeldt" <mik...@djurfeldt.com>: > Hi, > > Hi think this is a marvelous development and, for what it's worth, in the > right direction. Many, many thanks! > > Maybe this is all completely obvious to you, but I want to remind, again, > about the plans and ideas I had for GOOPS before I had to leave it at its > rather prototypical and unfinished state: > > As you recall, generic functions (GFs) then carried a cache (ideas taken > from PCL) with "instantiated methods" (IM; there is probably a better term > and I might have called them "compiled methods" or "cmethods" > before)---method code where the types of the arguments are known since each > instance come from an actual invocation of the GF. Given a new invocation, > the dispatch mechanism would just use argument types to look up the correct > IM. > > I then had the idea that since we know the types of the arguments of the > IM, a lot of the type dispatch could be eliminated within the IM based on > flow information---very much in line with what you are doing here. If we > now add one more piece, things get really exciting: Wherever there is a > call to other GFs within one IM and the types of the arguments can be > deduced at the point of the call, then the polymorphic type dispatch of the > GF can be eliminated and we can replace the GF call with a direct call to > the corresponding IM. > > Given now that most of the standard scheme functions can be regarded as > polymorphic GFs, I then imagined that most of the type dispatch in a > program could be eliminated. Actual execution would mostly be direct calls > of IMs to IMs, something which the optimizer could continue to work on, > especially if it all was represented as CPS. > > Given your work here, I think that something like this could now rather > easily be implemented. That is, we re-introduce IMs, make them directly > callable, and substitute IM calls for GF calls when compiling them. I > gather that the code of IMs do not necessarily have to be hung onto GFs but > could be handled by some separate manager/data structures. > > Happy new year! > Mikael > > > On Mon, Jan 8, 2018 at 4:01 PM, Andy Wingo <wi...@pobox.com> wrote: > >> Hey all! >> >> This is an update along the road to Guile 3. Check >> https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00016.html for >> the previous entry. >> >> Since 25 November there have been around 100 commits or so. Firstly I >> merged in patches from stable-2.0, including patches corresponding to >> the improvements in the Guile 2.2.3 stable series release. >> >> Then, I started to look at "instruction explosion" for vector-ref et >> al. Basically the idea is to transform the various subcomponents of >> e.g. vector-ref into their constituent parts. In the concrete case of >> vector-ref, we have to check that the vector is a heap object, that its >> heap object tag is "vector", we have to extract the length from the heap >> object, then we have to check that the index is an integer between 0 and >> length-1, and finally we dereference the field in the vector. >> Instruction explosion turns all of these into different primcalls and >> branches. >> >> One thing that became apparent was that with instruction explosion, we'd >> have a lot more control flow. Information that the optimizer would >> learn in a specific way (e.g. via specialzied type inference / effects >> analysis handlers for vector-ref) would instead be learned by generic >> control flow. >> >> Concretely -- >> >> scheme@(guile-user)> ,x (lambda (v i) (vector-ref v i)) >> Disassembly of #<procedure 29f5e50 at <unknown port>:1:3 (v i)> at >> #x29f5b4c: >> >> 0 (assert-nargs-ee/locals 3 1) ;; 4 slots (2 args) at >> (unknown file):1:3 >> 1 (immediate-tag=? 2 7 0) ;; heap-object? at >> (unknown file):1:17 >> 3 (jne 23) ;; -> L3 >> 4 (heap-tag=? 2 127 13) ;; vector? >> 6 (jne 20) ;; -> L3 >> 7 (word-ref/immediate 3 2 0) >> 8 (ursh/immediate 3 3 8) >> 9 (immediate-tag=? 1 3 2) ;; fixnum? >> 11 (jne 13) ;; -> L2 >> 12 (untag-fixnum 0 1) >> 13 (s64-imm<? 0 0) >> 14 (jl 8) ;; -> L1 >> 15 (s64<? 0 3) >> 16 (jnl 6) ;; -> L1 >> 17 (mov 3 0) >> 18 (uadd/immediate 3 3 1) >> 19 (scm-ref 2 2 3) >> 20 (handle-interrupts) >> 21 (return-values 2) ;; 1 value >> L1: >> 22 (throw/value+data 1 201) ;; #(out-of-range >> "vector-ref" "Argument 2 out of range: ~S") >> L2: >> 24 (throw/value+data 1 225) ;; #(wrong-type-arg >> "vector-ref" "Wrong type argument in position 2 (expecting small integer): >> ~S") >> L3: >> 26 (throw/value+data 2 239) ;; #(wrong-type-arg >> "vector-ref" "Wrong type argument in position 1 (expecting vector): ~S") >> >> So this is a bit horrible and I need to make the disassembler do a >> better job, but anyway. Instructions 1 through 6 check that V is a >> vector; instructions 7 and 8 extract the length; 9 and 11 check that the >> index is a fixnum, 12 extracts the fixnum value as an untagged 64-bit >> integer, and 13 through 16 check that the index is in range. >> >> L1, L2, and L3 are bailouts. The idea here is that if this vector-ref >> is followed by some other operation on the vector, we'll at least get to >> elide the vector? checks, and maybe we can reuse the length extraction >> too. Et cetera. >> >> The optimizer makes decisions like when to elide redundant checks based >> on flow information. However for historical reasons unfortunately the >> "throw" terms actually did "continue" from the optimizer's perspective; >> whereas the information flowing to e.g. L3 shouldn't flow at all to >> instruction 7, the IR didn't have a way to denote terms that didn't >> continue at all. >> >> To fix this I had to make some changes to the IR. On the plus side, >> $throw is its own term kind now that doesn't have a continuation. Also, >> $branch is a term instead of an expression shoehorned into $continue; >> $prompt is a term too. Maybe one day we'll get a $select term for >> proper case compilation. >> >> Finally, the instruction explosion. I refactored the Tree-IL->CPS >> compiler to allow individual primcalls to have custom lowering routines, >> and tightened up $primcall so that it now always continues to $kargs. >> Then I added custom lowering routines for vector-ref et al, and cons and >> other things. The CPS IR refactors allowed me to remove some useless >> passes (elide-values, prune-bailouts). There were some optimizer bugs >> but generally things were already in a good state; e.g. here's a vector >> sum routine: >> >> scheme@(guile-user)> (define (v-sum v) >> (let lp ((n 0) (sum 0)) >> (if (< n (vector-length v)) >> (lp (+ n 1) (+ sum (vector-ref v n))) >> sum))) >> scheme@(guile-user)> ,x v-sum >> Disassembly of #<procedure v-sum (v)> at #x1fa2750: >> >> 0 (assert-nargs-ee/locals 2 3) ;; 5 slots (1 arg) at >> (unknown file):1:0 >> 1 (make-short-immediate 4 2) ;; 0 >> 2 (immediate-tag=? 3 7 0) ;; heap-object? at >> (unknown file):1:51 >> 4 (jne 39) ;; -> L5 >> 5 (heap-tag=? 3 127 13) ;; vector? >> 7 (jne 36) ;; -> L5 >> 8 (word-ref/immediate 2 3 0) >> 9 (ursh/immediate 2 2 8) >> 10 (imm-s64<? 2 0) at >> (unknown file):1:46 >> 11 (jnl 29) ;; -> L4 >> 12 (load-s64 1 0 0) at >> (unknown file):1:89 >> 15 (s64<? 1 2) >> 16 (jnl 22) ;; -> L3 >> 17 (mov 4 1) >> 18 (uadd/immediate 4 4 1) >> 19 (scm-ref 4 3 4) >> 20 (add/immediate 4 4 0) at >> (unknown file):1:82 >> 21 (load-s64 1 0 1) >> 24 (s64<? 1 2) at >> (unknown file):1:46 >> 25 (jnl 11) ;; -> L2 >> L1: >> 26 (handle-interrupts) >> 27 (mov 0 1) at >> (unknown file):1:74 >> 28 (uadd/immediate 0 0 1) >> 29 (uadd/immediate 1 1 1) at >> (unknown file):1:89 >> 30 (scm-ref 1 3 1) >> 31 (add 4 4 1) at >> (unknown file):1:82 >> 32 (s64<? 0 2) at >> (unknown file):1:46 >> 33 (jnl 7) ;; -> L4 >> 34 (mov 1 0) at >> (unknown file):1:70 >> 35 (j -9) ;; -> L1 >> L2: >> 36 (mov 0 1) at >> (unknown file):1:46 >> 37 (j 3) ;; -> L4 >> L3: >> 38 (throw/value+data 4 204) ;; #(out-of-range "vector-ref" >> "Argument 2 out of range: ~S") at (unknown file):1:89 >> L4: >> 40 (handle-interrupts) >> 41 (mov 3 4) >> 42 (return-values 2) ;; 1 value >> L5: >> 43 (throw/value+data 3 233) ;; #(wrong-type-arg >> "vector-length" "Wrong type argument in position 1 (expect…") at (unknown >> file):1:51 >> >> The bit between L1 and L2 is the loop. Note that the loop does no >> dynamic checks besides checking that the unboxed index is within bounds >> -- notably the vector? check, the length computation, and the index >> untagging were hoisted out. The "scm-ref" instruction is the raw >> unchecked memory accessor instruction that will correspond to a load >> when compiled to machine code. >> >> This code runs a little slower than it did a month ago because >> instruction explosion is generally more instructions than what we had >> before. But this is expected. I expect the corresponding amount of >> machine code to be lower, when we emit machine code, sometimes >> significantly so. Doing things this way takes away the temptation for >> an eventual JIT to do optimization-like tasks -- we keep it all in >> generic CPS, and the JIT will be easy to write and generate good code. >> Until then, hacking on Guile is a bit slow though, in terms of compile >> time. >> >> Next up is instruction explosion for other data structures (boxes, >> structs, strings, etc); then I have to figure out what to do for >> arithmetic that I can't specialize (like that "add" at IP 31). I guess >> polymorphic inline caches are the real solution there; we'll see. >> >> OK that's the update. Happy new year and happy hacking with Guile :) >> >> Cheers, >> >> Andy >> >> >