On Sat, May 17, 2014 at 09:36:00AM +0100, Nicholas Clark wrote: > On Fri, May 16, 2014 at 10:05:36PM +0100, Nicholas Clark wrote: > > > Both the 32 and 64 bit code paths tested on x86_64 (with some editing of > > configs). Builds on real x86. The Raspberry Pi has passed all of NQPs > > tests, which means that it should work (without this it would SEGV on the > > first thing the NQP Makefile tried to run) > > With this: > > Stage start : 0.000 > Stage parse : 6995.769 > Stage syntaxcheck: 0.034 > Stage ast : 0.028 > Stage optimize : 7349.666 > Stage mast : 25152.209 > Stage mbc : 456.322 > > and > > All tests successful. > Files=22, Tests=271, 243 wallclock secs ( 1.74 usr 0.65 sys + 193.70 cusr > 19.76 csys = 215.85 CPU) > Result: PASS
> Smaller is better. (Less is MOAR) Attached patches simply re-order the members of structs to reduce their size, by avoiding holes caused by C ABI alignment constraints. pahole makes finding these savings easy: https://git.kernel.org/cgit/devel/pahole/pahole.git/ [The libuv folks could probably benefit from looking at their structs] For the Pi (which is swapping heavily) the results are quite noticeable. Stage start : 0.000 Stage parse : 6293.815 Stage syntaxcheck: 0.115 Stage ast : 0.051 Stage optimize : 6429.193 Stage mast : 19842.892 Stage mbc : 436.754 All tests successful. Files=22, Tests=271, 235 wallclock secs ( 1.83 usr 0.64 sys + 197.40 cusr 17.62 csys = 217.49 CPU) Result: PASS real 570m19.475s user 53m4.540s sys 18m2.170s Supplied as one patch per re-ordering. It shouldn't break anything (tested on x86_64, x86 and PPC64 Linux), but it might be nicer to be able to bisect into arbitrary points. I'd quite like to take struct MVMOpInfo and swap the first two members: /* Information about an opcode. */ struct MVMOpInfo { MVMuint16 opcode; const char *name; char mark[2]; MVMuint8 num_operands; MVMuint8 pure; MVMuint8 deopt_point; MVMuint8 operands[MVM_MAX_OPERANDS]; }; That would save 4 bytes per struct on x86 and ARM, 8 bytes on x86_64. Which I think works out as about 4K saving on the 32 bit platforms, 8K on 64, which doesn't seem much, but it's one or two disk blocks that no longer need to be loaded for every interpreter startup, which takes things in a good direction. Nicholas Clark
>From fad9e289487f4af486848dddcb39730eed31554a Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 11:51:45 +0200 Subject: [PATCH 01/11] Re-order members of MVMCompUnitBody to avoid alignment holes. Saves 4 bytes on ARM and x86, 32 bytes on x86_64. --- src/6model/reprs/MVMCompUnit.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/6model/reprs/MVMCompUnit.h b/src/6model/reprs/MVMCompUnit.h index 0015515..4699e10 100644 --- a/src/6model/reprs/MVMCompUnit.h +++ b/src/6model/reprs/MVMCompUnit.h @@ -32,9 +32,9 @@ struct MVMCompUnitBody { /* The various static frames in the compilation unit, along with a * code object for each one. */ + MVMuint32 num_frames; MVMStaticFrame **frames; MVMObject **coderefs; - MVMuint32 num_frames; MVMStaticFrame *main_frame; MVMStaticFrame *load_frame; MVMStaticFrame *deserialize_frame; @@ -45,22 +45,25 @@ struct MVMCompUnitBody { MVMuint16 max_callsite_size; /* The extension ops used by the compilation unit. */ - MVMExtOpRecord *extops; MVMuint16 num_extops; + MVMExtOpRecord *extops; /* The string heap and number of strings. */ MVMString **strings; MVMuint32 num_strings; /* Serialized data, if any. */ - char *serialized; MVMint32 serialized_size; + char *serialized; /* Array of the resolved serialization contexts, and how many we * have. A null in the list indicates not yet resolved */ MVMSerializationContext **scs; MVMuint32 num_scs; + /* How we should deallocate data_start. */ + MVMDeallocate deallocate; + /* List of serialization contexts in need of resolution. This is an * array of string handles; its length is determined by num_scs above. * once an SC has been resolved, the entry on this list is NULLed. If @@ -79,9 +82,6 @@ struct MVMCompUnitBody { /* Handle, if any, associated with a mapped file. */ void *handle; - - /* How we should deallocate data_start. */ - MVMDeallocate deallocate; }; struct MVMCompUnit { MVMObject common; -- 2.0.0-rc3-456-gfa0cd67
>From 64687a7bdba9ee1eaf00df4809e51f7359012e82 Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 12:51:13 +0200 Subject: [PATCH 02/11] Re-order members of MVMSerializationRoot to avoid alignment holes. Saves 16 bytes on x86_64. --- src/6model/serialization.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/6model/serialization.h b/src/6model/serialization.h index 437f1c1..a6eae4f 100644 --- a/src/6model/serialization.h +++ b/src/6model/serialization.h @@ -6,17 +6,17 @@ struct MVMSerializationRoot { /* The version of the serialization format. */ MVMint32 version; - /* The number of dependencies, as well as a pointer to the - * dependencies table. */ - MVMint32 num_dependencies; - char *dependencies_table; - /* The SC we're serializing/deserializing. */ MVMSerializationContext *sc; /* List of the serialization context objects that we depend on. */ MVMSerializationContext **dependent_scs; + /* The number of dependencies, as well as a pointer to the + * dependencies table. */ + char *dependencies_table; + MVMint32 num_dependencies; + /* The number of STables, as well as pointers to the STables * table and data chunk. */ MVMint32 num_stables; @@ -25,9 +25,9 @@ struct MVMSerializationRoot { /* The number of objects, as well as pointers to the objects * table and data chunk. */ - MVMint32 num_objects; char *objects_table; char *objects_data; + MVMint32 num_objects; /* The number of closures, as we as a pointer to the closures * table. */ @@ -36,9 +36,9 @@ struct MVMSerializationRoot { /* The number of contexts (e.g. frames), as well as pointers to * the contexts table and data chunk. */ - MVMint32 num_contexts; char *contexts_table; char *contexts_data; + MVMint32 num_contexts; /* The number of repossessions and pointer to repossessions table. */ MVMint32 num_repos; -- 2.0.0-rc3-456-gfa0cd67
>From 07160f91e5c0811b371bb265dd367fee96740c55 Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 13:34:26 +0200 Subject: [PATCH 03/11] Re-order members of MVMThreadContext to avoid alignment holes. Saves 8 bytes on ARM, and 16 bytes on x86_64. --- src/core/threadcontext.h | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/src/core/threadcontext.h b/src/core/threadcontext.h index c80f85b..35199f1 100644 --- a/src/core/threadcontext.h +++ b/src/core/threadcontext.h @@ -67,6 +67,15 @@ struct MVMThreadContext { /* Where we're allocating. */ MVMAllocationTarget allocate_in; + /* Internal ID of the thread. */ + MVMuint32 thread_id; + + /* Thread object representing the thread. */ + MVMThread *thread_obj; + + /* The frame lying at the base of the current thread. */ + MVMFrame *thread_entry_frame; + /* Pointer to where the interpreter's current opcode is stored. */ MVMuint8 **interp_cur_op; @@ -81,10 +90,6 @@ struct MVMThreadContext { * is stored. */ MVMCompUnit **interp_cu; - /* Jump buffer, used when an exception is thrown from C-land and we need - * to fall back into the interpreter. */ - jmp_buf interp_jump; - /* The frame we're currently executing. */ MVMFrame *cur_frame; @@ -122,14 +127,10 @@ struct MVMThreadContext { /* The second GC generation allocator. */ MVMGen2Allocator *gen2; - /* Internal ID of the thread. */ - MVMuint32 thread_id; - - /* Thread object representing the thread. */ - MVMThread *thread_obj; - - /* The frame lying at the base of the current thread. */ - MVMFrame *thread_entry_frame; + /* Memory buffer pointing to the last thing we serialized, intended to go + * into the next compilation unit we write. */ + char *serialized; + MVMint32 serialized_size; /* Temporarily rooted objects. This is generally used by code written in * C that wants to keep references to objects. Since those may change @@ -171,19 +172,19 @@ struct MVMThreadContext { * index 0. */ MVMObject *compiling_scs; - /* Memory buffer pointing to the last thing we serialized, intended to go - * into the next compilation unit we write. */ - char *serialized; - MVMint32 serialized_size; - /* Dispatcher set for next invocation to take. */ MVMObject *cur_dispatcher; + /* Cache of native code callback data. */ + MVMNativeCallback *native_callback_cache; + /* Random number generator state. */ MVMuint64 rand_state[2]; - /* Cache of native code callback data. */ - MVMNativeCallback *native_callback_cache; + /* Jump buffer, used when an exception is thrown from C-land and we need + * to fall back into the interpreter. These things are huge, so put it + * near the end to keep the hotter stuff on the same cacheline. */ + jmp_buf interp_jump; #if MVM_HLL_PROFILE_CALLS /* storage of profile timings */ -- 2.0.0-rc3-456-gfa0cd67
>From eb8e1a732fbe9ad2e3586b24bb5a5cd3d933a82b Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 14:30:45 +0200 Subject: [PATCH 04/11] Re-order members of MVMSpeshBB to avoid alignment holes. Saves 8 bytes on ARM and x86, 24 bytes on x86_64. --- src/spesh/graph.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/spesh/graph.h b/src/spesh/graph.h index 372ea9b..a0b389e 100644 --- a/src/spesh/graph.h +++ b/src/spesh/graph.h @@ -76,18 +76,20 @@ struct MVMSpeshBB { /* Basic blocks we may go to after this one. */ MVMSpeshBB **succ; - MVMuint16 num_succ; /* Basic blocks that we may arrive into this one from. */ MVMSpeshBB **pred; - MVMuint16 num_pred; /* Children in the dominator tree. */ MVMSpeshBB **children; - MVMuint16 num_children; /* Dominance frontier set. */ MVMSpeshBB **df; + + /* Counts for the above, grouped together to avoid alignment holes. */ + MVMuint16 num_succ; + MVMuint16 num_pred; + MVMuint16 num_children; MVMuint16 num_df; /* The next basic block in original linear code order. */ -- 2.0.0-rc3-456-gfa0cd67
>From f5233b096b91394e903dd72fcd87d9be71207a34 Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 15:05:53 +0200 Subject: [PATCH 05/11] Re-order members of MVMStaticFrameBody to avoid alignment holes. Saves 8 bytes on x86_64. --- src/6model/reprs/MVMStaticFrame.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/6model/reprs/MVMStaticFrame.h b/src/6model/reprs/MVMStaticFrame.h index cdd8443..d8d3edc 100644 --- a/src/6model/reprs/MVMStaticFrame.h +++ b/src/6model/reprs/MVMStaticFrame.h @@ -52,15 +52,15 @@ struct MVMStaticFrameBody { /* Count of lexicals. */ MVMuint32 num_lexicals; - /* The number of exception handlers this frame has. */ - MVMuint32 num_handlers; - /* Frame exception handlers information. */ MVMFrameHandler *handlers; + /* The number of exception handlers this frame has. */ + MVMuint32 num_handlers; + /* Lexotics cache. */ - MVMLexotic **lexotics; MVMint32 num_lexotics; + MVMLexotic **lexotics; /* The compilation unit unique ID of this frame. */ MVMString *cuuid; -- 2.0.0-rc3-456-gfa0cd67
>From 9c76829c4d59c9f21aed9c4cf6aff543b96485ad Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 15:43:09 +0200 Subject: [PATCH 06/11] Re-order members of MVMP6opaqueREPRData to avoid alignment holes. Saves 4 bytes on ARM and x86, 16 bytes on x86_64. --- src/6model/reprs/P6opaque.h | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/src/6model/reprs/P6opaque.h b/src/6model/reprs/P6opaque.h index 25e158f..158a6ea 100644 --- a/src/6model/reprs/P6opaque.h +++ b/src/6model/reprs/P6opaque.h @@ -41,18 +41,11 @@ struct MVMP6opaqueREPRData { * slots can vary in size. */ MVMuint16 num_attributes; - /* Maps attribute position numbers to the byte offset in the object. */ - MVMuint16 *attribute_offsets; - - /* If the attribute was actually flattened in to this object from another - * representation, this is the s-table of the type of that attribute. NULL - * for attributes that are just reference types. */ - MVMSTable **flattened_stables; + /* Slot containing object to delegate for positional things. */ + MVMint16 pos_del_slot; - /* Instantiated objects are just a blank piece of memory that needs to - * be set up. However, in some cases we'd like them to magically turn in - * to some container type. */ - MVMObject **auto_viv_values; + /* Slot containing object to delegate for associative things. */ + MVMint16 ass_del_slot; /* Flags if we are MI or not. */ MVMuint16 mi; @@ -66,24 +59,31 @@ struct MVMP6opaqueREPRData { /* Slot to delegate to when we need to unbox to a native string. */ MVMint16 unbox_str_slot; - /* If we have any other boxings, this maps repr ID to slot. */ - MVMP6opaqueBoxedTypeMap *unbox_slots; + /* Offsets into the object that are eligible for GC marking, and how + * many of them we have. */ + MVMuint16 gc_obj_mark_offsets_count; + MVMuint16 *gc_obj_mark_offsets; - /* Slot containing object to delegate for positional things. */ - MVMint16 pos_del_slot; + /* Maps attribute position numbers to the byte offset in the object. */ + MVMuint16 *attribute_offsets; - /* Slot containing object to delegate for associative things. */ - MVMint16 ass_del_slot; + /* If the attribute was actually flattened in to this object from another + * representation, this is the s-table of the type of that attribute. NULL + * for attributes that are just reference types. */ + MVMSTable **flattened_stables; + + /* Instantiated objects are just a blank piece of memory that needs to + * be set up. However, in some cases we'd like them to magically turn in + * to some container type. */ + MVMObject **auto_viv_values; + + /* If we have any other boxings, this maps repr ID to slot. */ + MVMP6opaqueBoxedTypeMap *unbox_slots; /* A table mapping attribute names to indexes (which can then be looked * up in the offset table). Uses a final null entry as a sentinel. */ MVMP6opaqueNameMap *name_to_index_mapping; - /* Offsets into the object that are eligible for GC marking, and how - * many of them we have. */ - MVMuint16 *gc_obj_mark_offsets; - MVMuint16 gc_obj_mark_offsets_count; - /* Slots holding flattened objects that need another REPR to initialize * them; terminated with -1. */ MVMint16 *initialize_slots; -- 2.0.0-rc3-456-gfa0cd67
>From 1f7e3fd9e7ba13867909f9bec58ddefab25c6730 Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 16:38:27 +0200 Subject: [PATCH 07/11] Re-order members of MVMInstance to avoid alignment holes. Saves 4 bytes on ARM, and 8 bytes on x86_64. --- src/core/instance.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/core/instance.h b/src/core/instance.h index abb8af8..a02438b 100644 --- a/src/core/instance.h +++ b/src/core/instance.h @@ -135,6 +135,16 @@ struct MVMInstance { /* Set of string constants. */ MVMStringConsts str_consts; + /* Specialization installation mutex (global, as it's low contention, so + * no real motivation to have it more fine-grained at present). */ + uv_mutex_t mutex_spesh_install; + + /* Log file for specializations, if we're to log them. */ + FILE *spesh_log_fh; + + /* Flag for if spesh is enabled. */ + MVMint32 spesh_enabled; + /* Number of representations registered so far. */ MVMuint32 num_reprs; @@ -174,10 +184,10 @@ struct MVMInstance { /* note: used atomically */ MVMThread *threads; - /* Number of passed command-line args */ - MVMint64 num_clargs; /* raw command line args from APR */ char **raw_clargs; + /* Number of passed command-line args */ + MVMint64 num_clargs; /* executable name */ const char *exec_name; /* program name; becomes first clargs entry */ @@ -246,16 +256,6 @@ struct MVMInstance { MVMCallsiteInterns *callsite_interns; uv_mutex_t mutex_callsite_interns; - /* Specialization installation mutex (global, as it's low contention, so - * no real motivation to have it more fine-grained at present). */ - uv_mutex_t mutex_spesh_install; - - /* Log file for specializations, if we're to log them. */ - FILE *spesh_log_fh; - - /* Flag for if spesh is enabled. */ - MVMint32 spesh_enabled; - /* Standard file handles. */ MVMObject *stdin_handle; MVMObject *stdout_handle; -- 2.0.0-rc3-456-gfa0cd67
>From ab8cc23c5a13dabb6fbc4a116d9a41f4c601165a Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 20:01:48 +0200 Subject: [PATCH 08/11] Re-order members of MVMSpeshCandidate to avoid alignment holes. Saves 8 bytes on x86_64. --- src/spesh/candidate.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/spesh/candidate.h b/src/spesh/candidate.h index 89748bc..409d1bd 100644 --- a/src/spesh/candidate.h +++ b/src/spesh/candidate.h @@ -24,12 +24,12 @@ struct MVMSpeshCandidate { /* Number of spesh slots. */ MVMuint32 num_spesh_slots; - /* Deoptimization mappings. */ - MVMint32 *deopts; - /* The number of deoptimization mappings we have. */ MVMuint32 num_deopts; + /* Deoptimization mappings. */ + MVMint32 *deopts; + /* Atomic integer for the number of times we've entered the code so far * for the purpose of logging, in the trace phase. We used this as an * index into the log slots when running logging code. Once it hits the -- 2.0.0-rc3-456-gfa0cd67
>From 29c94eab4ada8da1678dc7a5fa0ca04772f6924c Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 20:33:05 +0200 Subject: [PATCH 09/11] Re-order members of MVMSTable to avoid alignment holes. Saves 8 bytes on x86_64. --- src/6model/6model.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/6model/6model.h b/src/6model/6model.h index 9f433b6..1edadbf 100644 --- a/src/6model/6model.h +++ b/src/6model/6model.h @@ -219,6 +219,9 @@ struct MVMSTable { /* The type-object. */ MVMObject *WHAT; + /* The underlying package stash. */ + MVMObject *WHO; + /* By-name method dispatch cache. */ MVMObject *method_cache; @@ -273,9 +276,6 @@ struct MVMSTable { /* Information - if any - about how we can turn something of this type * into a boolean. */ MVMBoolificationSpec *boolification_spec; - - /* The underlying package stash. */ - MVMObject *WHO; /* The HLL that this type is owned by, if any. */ MVMHLLConfig *hll_owner; -- 2.0.0-rc3-456-gfa0cd67
>From 1417f81561422e36862a95257cb16f004fb2e6ca Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 21:08:12 +0200 Subject: [PATCH 10/11] Re-order members of MVMSerializationContextBody to avoid alignment holes. Saves 4 bytes on ARM. --- src/6model/reprs/SCRef.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/6model/reprs/SCRef.h b/src/6model/reprs/SCRef.h index b03ab87..312fc5d 100644 --- a/src/6model/reprs/SCRef.h +++ b/src/6model/reprs/SCRef.h @@ -10,9 +10,9 @@ struct MVMSerializationContextBody { MVMString *description; /* The root set of objects that live in this SC. */ - MVMObject **root_objects; MVMuint64 num_objects; MVMuint64 alloc_objects; + MVMObject **root_objects; /* The root set of STables that live in this SC. */ MVMSTable **root_stables; -- 2.0.0-rc3-456-gfa0cd67
>From 17ce308459771a3edd198bfa672f4b999c2bd931 Mon Sep 17 00:00:00 2001 From: Nicholas Clark <n...@ccl4.org> Date: Mon, 19 May 2014 21:36:18 +0200 Subject: [PATCH 11/11] Re-order members of MVMInvocationSpec to avoid alignment holes. Saves 4 bytes on ARM. --- src/core/frame.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/core/frame.h b/src/core/frame.h index 22196ab..763accf 100644 --- a/src/core/frame.h +++ b/src/core/frame.h @@ -174,8 +174,8 @@ struct MVMInvocationSpec { MVMObject *md_class_handle; MVMString *md_cache_attr_name; MVMint64 md_cache_hint; - MVMString *md_valid_attr_name; MVMint64 md_valid_hint; + MVMString *md_valid_attr_name; }; void MVM_frame_invoke(MVMThreadContext *tc, MVMStaticFrame *static_frame, -- 2.0.0-rc3-456-gfa0cd67