On 20 June 2012 17:41, Tom Lane <t...@sss.pgh.pa.us> wrote: > Peter Geoghegan <pe...@2ndquadrant.com> writes: >> No, I'm suggesting it would probably be at least a bit of a win here >> to cache the constant, and only have to do a strxfrm() + strcmp() per >> comparison. > > Um, have you got any hard evidence to support that notion? The > traditional advice is that strcoll is faster than using strxfrm unless > the same strings are to be compared repeatedly. I'm not convinced that > saving the strxfrm on just one side will swing the balance.
I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with no dependencies for anyone who cares to try this out, other than a C++ compiler and the standard library. It just inserts 500,000 elements (the same succession of psuedo-random strings again and again) into a std::set, which is implemented using a red-black tree on my machine. In one case, we just use strcmp() (there is actually no tie-breaker). In the other case, the comparator allocates and fills memory for a strxfrm blob when needed, caches it, and uses strxfrm() to generate blobs for other elements into a dedicated buffer which persists across comparisons. It prints the duration of each run, and a running average for each case until terminated. Here is what the output looks like on my machine: [peter@peterlaptop strxfrm_test]$ g++ -O2 set_test.cpp [peter@peterlaptop strxfrm_test]$ ./a.out Time elapsed with strcoll: 1.485290 seconds Time elapsed with strxfrm optimization: 1.070211 seconds Time elapsed with strcoll: 1.813725 seconds Time elapsed with strxfrm optimization: 1.097950 seconds Time elapsed with strcoll: 1.813381 seconds Time elapsed with strxfrm optimization: 1.102769 seconds Time elapsed with strcoll: 1.826708 seconds Time elapsed with strxfrm optimization: 1.105093 seconds Time elapsed with strcoll: 1.842156 seconds Time elapsed with strxfrm optimization: 1.103198 seconds *****strcoll average: 1.756252 strxfrm average: 1.095844***** Time elapsed with strcoll: 1.834785 seconds Time elapsed with strxfrm optimization: 1.105369 seconds Time elapsed with strcoll: 1.817449 seconds Time elapsed with strxfrm optimization: 1.110386 seconds Time elapsed with strcoll: 1.812446 seconds Time elapsed with strxfrm optimization: 1.101266 seconds Time elapsed with strcoll: 1.813595 seconds Time elapsed with strxfrm optimization: 1.099444 seconds Time elapsed with strcoll: 1.814584 seconds Time elapsed with strxfrm optimization: 1.099542 seconds *****strcoll average: 1.787412 strxfrm average: 1.099523***** Time elapsed with strcoll: 1.820218 seconds Time elapsed with strxfrm optimization: 1.102984 seconds Time elapsed with strcoll: 1.817549 seconds Time elapsed with strxfrm optimization: 1.100526 seconds Time elapsed with strcoll: 1.817020 seconds Time elapsed with strxfrm optimization: 1.099273 seconds Time elapsed with strcoll: 1.820983 seconds Time elapsed with strxfrm optimization: 1.100501 seconds Time elapsed with strcoll: 1.813270 seconds Time elapsed with strxfrm optimization: 1.098904 seconds *****strcoll average: 1.797544 strxfrm average: 1.099828***** Time elapsed with strcoll: 1.815952 seconds Time elapsed with strxfrm optimization: 1.102583 seconds If you'd like to print the contents of the set, there are a couple of lines you can uncomment. It should be straightforward to understand the program, even if you don't know any C++. Note that I still think that the compelling case for strxfrm() caching is sorting tuples, not btree index traversal. All that I ask is that you allow that on the whole, this will pay for the cost of verifying equivalency within _bt_checkkeys() once per index-scan. I am aware that a red-black tree isn't generally an excellent proxy for how a btree will perform, but I don't think that that really matters here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
/* * set_test.cpp: * * Measure the performance characteristics of using strcoll() as * a string comparator rather than caching strxfrm() blobs. * * Author: Peter Geoghegan * * A std::set is an ascending container of unique elements. The implementation * that I used for any numbers published was the one from Gnu libstdc++, on * Fedora 16. That particular implementation uses a red-black tree, but the * standard's requirement that common operations (search, insert, delete) take * logarithmic time effectively requires the use of some kind of self-balancing * binary search tree, which isn't a bad proxy for a btree for my purposes. */ #include <set> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <string> #include <sys/time.h> #define ORIG_STRING_SIZE 32 #define BLOB_STRING_SIZE 128 double timeval_subtract(struct timeval *x, struct timeval *y) { struct timeval result; /* Compute the time remaining to wait. tv_usec is certainly positive. */ result.tv_sec = x->tv_sec - y->tv_sec; result.tv_usec = x->tv_usec - y->tv_usec; /* return difference in seconds */ return result.tv_sec + ((double) result.tv_usec / 1000000); } void reset_prng_seed() { srand(0); } /* * Enter a psuedo-random sequence of characters into a buffer specified by buf * * This just places lower-case ascii characters into the buffer, so strings * generated by this function should be relatively cheap to do a * collation-aware sort with when LC_COLLATE is set to en_*.UTF-8. I've made a * point of using en_US.UTF-8 in benchmarks. */ void place_random_string(char *buf) { static std::string charset = "abcdefghijklmnopqrstuvwxyz"; for (int i = 0; i < ORIG_STRING_SIZE; i++) buf[i] = charset[rand() % charset.length()]; } class string_wrapper { public: string_wrapper(bool use_strcoll); string_wrapper(const string_wrapper &rhs); virtual ~string_wrapper(); // operator< is our comparator, required by std::set bool operator<(const string_wrapper &rhs) const; // If you want to see the strings contents printed to stdout, call this // function: void print_contents() const; private: // Don't provide an asignment operator const string_wrapper& operator=(const string_wrapper& rhs); // mark these "mutable" so they can be modified in our comparator as part // of the required lazy initialization (std::set in turn requires that it // is a const member function). Not very idiomatic, but the easiest way to // approximate how this might work in Postgres. mutable char *orig_string; mutable char *strxfrm_buf; mutable char *strxfrm_rhs_buf; bool use_strcoll; }; typedef std::set<string_wrapper> string_set; /* * Default constructor */ string_wrapper::string_wrapper(bool use_strcoll): use_strcoll(use_strcoll), orig_string(new char[ORIG_STRING_SIZE]), strxfrm_buf(NULL), strxfrm_rhs_buf(NULL) { // Generate a random string in a buffer. This class conceptually manages // that resource. place_random_string(orig_string); } /* * Copy constructor: Make a deep copy of the class */ string_wrapper::string_wrapper(const string_wrapper &rhs): use_strcoll(rhs.use_strcoll), orig_string(new char[ORIG_STRING_SIZE]), strxfrm_buf(NULL), strxfrm_rhs_buf(NULL) { // Only copy the original string strcpy(orig_string, rhs.orig_string); } /* * Boilerplate destructor */ string_wrapper::~string_wrapper() { delete [] orig_string; if (strxfrm_buf) { delete [] strxfrm_buf; delete [] strxfrm_rhs_buf; } } /* * Comparator required by std::set */ bool string_wrapper::operator<(const string_wrapper &rhs) const { if (use_strcoll) { /* * I might have simulated the extra strcmp() tie-break overhead here, * but that interface isn't supported. */ return strcoll(orig_string, rhs.orig_string) < 0; } else { // Do we already have a buffer allocated? We lazily allocated this on // the first time through. if (!strxfrm_buf) { strxfrm_buf = new char [BLOB_STRING_SIZE]; strxfrm_rhs_buf = new char [BLOB_STRING_SIZE]; // Set up our cached blob strxfrm(strxfrm_buf, orig_string, BLOB_STRING_SIZE); } // Create temporary blob for rhs element as part of tree traversal strxfrm(strxfrm_rhs_buf, rhs.orig_string, BLOB_STRING_SIZE); return strcmp(strxfrm_buf, strxfrm_rhs_buf) < 0; } } void string_wrapper::print_contents() const { printf("String contents: %s\n", orig_string); } void print_set_contents(const string_set &rhs) { // The map is sorted and iterable for (string_set::const_iterator i = rhs.begin(); i != rhs.end(); ++i) { i->print_contents(); } } #define NUM_ELEMS 500000 int main(int argc, const char *argv[]) { double non_opt_tot = 0, opt_tot = 0; int runs = 0; for(;;) { struct timeval begin, end; double secs_result; // Just so there's no ambiguity about what memory is allocated when, // allocate set dynamically: string_set *non_opt = new string_set; /* simple strcoll run */ gettimeofday(&begin, NULL); for (int i = 0; i < NUM_ELEMS; ++i) { string_wrapper wrap(false); non_opt->insert(wrap); } gettimeofday(&end, NULL); secs_result = timeval_subtract(&end, &begin); printf("Time elapsed with strcoll: %f seconds\n", secs_result); non_opt_tot += secs_result; // You may wish to uncomment this to see the sorted elements printed // //print_set_contents(*non_opt); // This delete statement calls the set's destructor, which is turn calls // each string's destructor, freeing all resources used. delete non_opt; // Actual string values should be completely deterministic for both sets of behaviours reset_prng_seed(); string_set *opt = new string_set; /* strxfrm run */ gettimeofday(&begin, NULL); for (int i = 0; i < NUM_ELEMS; ++i) { string_wrapper wrap(true); opt->insert(wrap); } gettimeofday(&end, NULL); secs_result = timeval_subtract(&end, &begin); printf("Time elapsed with strxfrm optimization: %f seconds\n", secs_result); opt_tot += secs_result; // You may wish to uncomment this to see the sorted elements printed // //print_set_contents(*opt); delete opt; ++runs; if (runs % 5 ==0) printf("*****strcoll average: %f strxfrm average: %f*****\n", non_opt_tot / runs, opt_tot / runs); } return 0; }
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers