Hi, The patch below does two things:
1. It introduces `scm_take_locale_symbol ()'. 2. It modifies `scm_from_locale_symbol ()' so that it doesn't create a Scheme string to do the job. This second modification has a nice effect: it can significantly reduce the number of objects created at load-time. Unfortunately, Guile's built-in reader always produces Scheme strings (in `scm_read_token ()') so it cannot benefit from this optimization. Using a slightly modified version of `guile-reader' which does not create Scheme strings when reading a symbol, I tried to measure the improvement compared to Guile's built-in reader. Basically, I had Guile load a program that defines 20000 variables[*], first with Guile's built-in reader, then with `guile-reader'. With Guile's built-in reader: % cumulative self self total time seconds seconds calls s/call s/call name 20.85 17.86 17.86 41385 0.00 0.00 ceval 14.05 29.89 12.03 49963 0.00 0.00 scm_i_sweep_card 12.50 40.60 10.71 2191128 0.00 0.00 scm_gc_mark_dependencies 8.49 47.87 7.27 3149 0.00 0.01 scm_i_mark_weak_vector_non_weaks 5.50 52.58 4.72 4729157 0.00 0.00 scm_cell 5.43 57.23 4.65 5710240 0.00 0.00 scm_gc_mark ... 0.08 83.46 0.07 20687 0.00 0.00 scm_i_make_string With `guile-reader': % cumulative self self total time seconds seconds calls s/call s/call name 23.59 17.66 17.66 41385 0.00 0.00 ceval 14.34 28.39 10.73 46707 0.00 0.00 scm_i_sweep_card 11.24 36.80 8.41 1810700 0.00 0.00 scm_gc_mark_dependencies 7.99 42.78 5.98 2712 0.00 0.01 scm_i_mark_weak_vector_non_weaks 6.32 47.51 4.73 4729153 0.00 0.00 scm_cell 5.75 51.81 4.31 4767765 0.00 0.00 scm_gc_mark ... 0.00 74.83 0.00 687 0.00 0.00 scm_i_make_string The timings observed are around 15 s. (w/ Guile's built-in reader) vs. 13 s. on my 500 MHz G4. Clearly, the mark phase is much quicker as fewer strings were created in the second case. Of course, it would be nice if the built-in reader could benefit from this as well, but this requires a fair amount of (tedious) work. Besides, `scm_take_locale_symbol ()' could be beneficial to application writers as well. Thanks, Ludovic. [*] Produced by: (with-output-to-file "t.scm" (lambda () (for-each (lambda (x) (format #t "(define sym~a ~a)~%" x x)) (iota 20000)))) libguile: 2005-12-19 Ludovic Courtès <[EMAIL PROTECTED]> * strings.c (scm_i_take_stringbufn): New. (scm_i_c_take_symbol): New. (scm_take_locale_stringn): Use `scm_i_take_stringbufn ()'. * strings.h (scm_i_c_take_symbol): New. (scm_i_take_stringbufn): New. * symbols.c (lookup_interned_symbol): New function. (scm_i_c_mem2symbol): New function. (scm_i_mem2symbol): Use `lookup_symbol ()'. (scm_from_locale_symbol): Use `scm_i_c_mem2symbol ()'. This avoids creating a new Scheme string. (scm_from_locale_symboln): Likewise. (scm_take_locale_symbol): New. (scm_take_locale_symboln): New. * symbols.h (scm_take_locale_symbol): New. (scm_take_locale_symboln): New. doc/ref: 2005-12-19 Ludovic Courtès <[EMAIL PROTECTED]> * api-data.texi (Operations Related to Symbols): Documented `scm_take_locale_symbol ()'. --- orig/doc/ref/api-data.texi +++ mod/doc/ref/api-data.texi @@ -4551,6 +4551,16 @@ specified explicitly by @var{len}. @end deffn [EMAIL PROTECTED] {C Function} SCM scm_take_locale_symbol (char *str) [EMAIL PROTECTED] {C Function} SCM scm_take_locale_symboln (char *str, size_t len) +Like @code{scm_from_locale_symbol} and @code{scm_from_locale_symboln}, +respectively, but also frees @var{str} with @code{free} eventually. +Thus, you can use this function when you would free @var{str} anyway +immediately after creating the Scheme string. In certain cases, Guile +can then use @var{str} directly as its internal representation. [EMAIL PROTECTED] deftypefn + + Finally, some applications, especially those that generate new Scheme code dynamically, need to generate symbols for use in the generated code. The @code{gensym} primitive meets this need: --- orig/libguile/strings.c +++ mod/libguile/strings.c @@ -122,6 +122,17 @@ } } +/* Return a new stringbuf whose underlying storage consists of the LEN octets + pointed to by STR. */ +SCM_C_INLINE SCM +scm_i_take_stringbufn (char *str, size_t len) +{ + scm_gc_register_collectable_memory (str, len, "stringbuf"); + + return scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str, + (scm_t_bits) len, (scm_t_bits) 0); +} + SCM scm_i_stringbuf_mark (SCM buf) { @@ -412,6 +423,29 @@ (scm_t_bits) hash, SCM_UNPACK (props)); } +SCM +scm_i_c_make_symbol (const char *name, size_t len, + scm_t_bits flags, unsigned long hash, SCM props) +{ + SCM buf = make_stringbuf (len); + memcpy (STRINGBUF_CHARS (buf), name, len); + + return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf), + (scm_t_bits) hash, SCM_UNPACK (props)); +} + +/* Return a new symbol that uses the LEN bytes pointed to by NAME as its + underlying storage. */ +SCM +scm_i_c_take_symbol (char *name, size_t len, + scm_t_bits flags, unsigned long hash, SCM props) +{ + SCM buf = scm_i_take_stringbufn (name, len); + + return scm_double_cell (scm_tc7_symbol | flags, SCM_UNPACK (buf), + (scm_t_bits) hash, SCM_UNPACK (props)); +} + size_t scm_i_symbol_length (SCM sym) { @@ -842,12 +876,10 @@ str[len] = '\0'; } - buf = scm_double_cell (STRINGBUF_TAG, (scm_t_bits) str, - (scm_t_bits) len, (scm_t_bits) 0); + buf = scm_i_take_stringbufn (str, len); res = scm_double_cell (STRING_TAG, SCM_UNPACK (buf), (scm_t_bits) 0, (scm_t_bits) len); - scm_gc_register_collectable_memory (str, len+1, "string"); return res; } --- orig/libguile/strings.h +++ mod/libguile/strings.h @@ -124,6 +124,12 @@ SCM_API SCM scm_i_make_symbol (SCM name, scm_t_bits flags, unsigned long hash, SCM props); +SCM_API SCM +scm_i_c_make_symbol (const char *name, size_t len, + scm_t_bits flags, unsigned long hash, SCM props); +SCM_API SCM +scm_i_c_take_symbol (char *name, size_t len, + scm_t_bits flags, unsigned long hash, SCM props); SCM_API const char *scm_i_symbol_chars (SCM sym); SCM_API size_t scm_i_symbol_length (SCM sym); SCM_API SCM scm_i_symbol_substring (SCM sym, size_t start, size_t end); @@ -144,6 +150,7 @@ SCM_API void scm_i_get_substring_spec (size_t len, SCM start, size_t *cstart, SCM end, size_t *cend); +SCM_API SCM scm_i_take_stringbufn (char *str, size_t len); /* deprecated stuff */ --- orig/libguile/symbols.c +++ mod/libguile/symbols.c @@ -85,43 +85,79 @@ } static SCM -scm_i_mem2symbol (SCM str) +lookup_interned_symbol (const char *name, size_t len, + unsigned long raw_hash) { - const char *name = scm_i_string_chars (str); - size_t len = scm_i_string_length (str); + /* Try to find the symbol in the symbols table */ + SCM l; + unsigned long hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols); + + for (l = SCM_HASHTABLE_BUCKET (symbols, hash); + !scm_is_null (l); + l = SCM_CDR (l)) + { + SCM sym = SCM_CAAR (l); + if (scm_i_symbol_hash (sym) == raw_hash + && scm_i_symbol_length (sym) == len) + { + const char *chrs = scm_i_symbol_chars (sym); + size_t i = len; + + while (i != 0) + { + --i; + if (name[i] != chrs[i]) + goto next_symbol; + } + + return sym; + } + next_symbol: + ; + } + + return SCM_BOOL_F; +} +static SCM +scm_i_c_mem2symbol (const char *name, size_t len) +{ + SCM symbol; size_t raw_hash = scm_string_hash ((const unsigned char *) name, len); size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols); + symbol = lookup_interned_symbol (name, len, raw_hash); + if (symbol != SCM_BOOL_F) + return symbol; + { - /* Try to find the symbol in the symbols table */ + /* The symbol was not found - create it. */ + SCM symbol = scm_i_c_make_symbol (name, len, 0, raw_hash, + scm_cons (SCM_BOOL_F, SCM_EOL)); - SCM l; + SCM slot = SCM_HASHTABLE_BUCKET (symbols, hash); + SCM cell = scm_cons (symbol, SCM_UNDEFINED); + SCM_SET_HASHTABLE_BUCKET (symbols, hash, scm_cons (cell, slot)); + SCM_HASHTABLE_INCREMENT (symbols); + if (SCM_HASHTABLE_N_ITEMS (symbols) > SCM_HASHTABLE_UPPER (symbols)) + scm_i_rehash (symbols, scm_i_hash_symbol, 0, "scm_mem2symbol"); - for (l = SCM_HASHTABLE_BUCKET (symbols, hash); - !scm_is_null (l); - l = SCM_CDR (l)) - { - SCM sym = SCM_CAAR (l); - if (scm_i_symbol_hash (sym) == raw_hash - && scm_i_symbol_length (sym) == len) - { - const char *chrs = scm_i_symbol_chars (sym); - size_t i = len; - - while (i != 0) - { - --i; - if (name[i] != chrs[i]) - goto next_symbol; - } - - return sym; - } - next_symbol: - ; - } + return symbol; } +} + +static SCM +scm_i_mem2symbol (SCM str) +{ + SCM symbol; + const char *name = scm_i_string_chars (str); + size_t len = scm_i_string_length (str); + size_t raw_hash = scm_string_hash ((const unsigned char *) name, len); + size_t hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols); + + symbol = lookup_interned_symbol (name, len, raw_hash); + if (symbol != SCM_BOOL_F) + return symbol; { /* The symbol was not found - create it. */ @@ -139,6 +175,7 @@ } } + static SCM scm_i_mem2uninterned_symbol (SCM str) { @@ -348,13 +385,50 @@ SCM scm_from_locale_symbol (const char *sym) { - return scm_string_to_symbol (scm_from_locale_string (sym)); + return scm_i_c_mem2symbol (sym, strlen (sym)); } SCM scm_from_locale_symboln (const char *sym, size_t len) { - return scm_string_to_symbol (scm_from_locale_stringn (sym, len)); + return scm_i_c_mem2symbol (sym, len); +} + +SCM +scm_take_locale_symboln (char *sym, size_t len) +{ + SCM res; + unsigned long raw_hash; + + if (len == (size_t)-1) + len = strlen (sym); + else + { + /* Ensure STR is null terminated. A realloc for 1 extra byte should + often be satisfied from the alignment padding after the block, with + no actual data movement. */ + sym = scm_realloc (sym, len+1); + sym[len] = '\0'; + } + + raw_hash = scm_string_hash ((unsigned char *)sym, len); + res = lookup_interned_symbol (sym, len, raw_hash); + if (res != SCM_BOOL_F) + { + free (sym); + return res; + } + + res = scm_i_c_take_symbol (sym, len, 0, raw_hash, + scm_cons (SCM_BOOL_F, SCM_EOL)); + + return res; +} + +SCM +scm_take_locale_symbol (char *sym) +{ + return scm_take_locale_symboln (sym, (size_t)-1); } void --- orig/libguile/symbols.h +++ mod/libguile/symbols.h @@ -56,6 +56,8 @@ SCM_API SCM scm_from_locale_symbol (const char *str); SCM_API SCM scm_from_locale_symboln (const char *str, size_t len); +SCM_API SCM scm_take_locale_symbol (char *sym); +SCM_API SCM scm_take_locale_symboln (char *sym, size_t len); /* internal functions. */ _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel