Hi, I wanted to share with the mailing list members here details about the project I've been working on:
https://github.com/jeffreyhorner/R-Array-Hash This is a re-implementation of R's hashed environments, the global variable cache, the global string cache and symbol table with cache-conscious array hash tables. The results are quite encouraging. However, the implementation is a big departure from R's API: "An array hash is a cache-conscious data structure that takes advantage of hardware prefetchers for improved performance on large hash tables, those large enough to fit in main memory and larger than fast fixed size cpu caches. However, their implementation is a radical departure from standard chained hash tables. Rather than using chains of hash buckets for collision resolution, array hashes use segements of contiguous memory called dynamic arrays to store keys and values. Adding and deleting items from the hash involve copying the entire segment to new areas in memory. While this may seem wasteful and slow, it's surprisingly efficient in both time and space. In R, hashed environments are implemented using lists with each list element (a CONS cell) acting as the hash bucket. The CONS cell is the binding agent for a symbol and value. Hashed environments are searched using the pointer address of the symbol rather than the symbol's printed name. R-Array-Hash takes advantage of this by implementing an integer array hash to store addresses of symbols and their associated values. Care is also taken to account for whether or not a binding is locked, active, etc. Similarly, R-Array-Hash re-implements R's string cache using a string array hash. This introduces the most radical change to R's API: CHAR() no longer returns an address that points to the area at the end of the SEXP (containing the string value). Rather it returns an address located in one of the contiguous dynamic arrays of the string hash table. Therefore, care must be taken in C code to use the address immediately since additions and deletions to the string hash could render the result of CHAR() useless. There are many areas of the code that sidestep this by calling translateChar(), which has been changed to always copy the string pointed by CHAR()." Comments, constructive or otherwise are welcome. Best, Jeff ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel