Hello, Just for the record, I found the problem and overcome the contention. The key was: innodb_thread_concurrency
Quoting the documentation: "innodb_thread_concurrency is the variable that limits the number of operating system threads that can run concurrently inside the InnoDB engine. Rest of the threads have to wait in a FIFO queue for execution. Also, threads waiting for locks are not counted in the number of concurrently executing threads." When the tests started, we set it to the recommended value: 2xNumber of Cores. I tried also setting it up with a crazy number.... Only when I disabled it (set global innodb_thread_concurrency = 0;) the reads went crazy and I was able to do 1.1M reads per second :-) The CPU is now the bottleneck (which makes sense) and reached around 93-94%, at that point I am not able to go over 1.1M r/s. The mutex_delay never appeared again after disabling the transactions limit. Manuel. 2012/12/7 Manuel Arostegui <man...@tuenti.com> > Hello all, > > I am testing handlersockets performance in a 5.5.28-29.1-log (Percona) > server. > These are the enabled options: > > loose_handlersocket_port = 9998 > loose_handlersocket_port_wr = 9999 > loose_handlersocket_threads = 48 > loose_handlersocket_threads_wr = 1 > innodb_spin_wait_delay=0 > > The machine has 24 (Xeon - 2.00GHz) cores and 64GB RAM. We are using > bonding to make sure the ethernets aren't limiting us here (we get around > 90Mbps) > > We are able to handler around 500K requests per second using handler > socket plugin. Even though it looks pretty impressive number, it's still > not close to the 750K ones Yoshinori is able to get ( > http://yoshinorimatsunobu.blogspot.com.es/2010/10/using-mysql-as-nosql-story-for.html > ) > The machine is acting a normal slave in a cluster - receiving normal > traffic from our site (we do this on purpose to see how many requests we > can handle in a normal workload environment) > > Obviously we're not expecting to get similar numbers as our tests aren't > the same. > However, doing a bit of profiling to try to determine what's the > bottleneck here we've seen this: > > CPU: Intel Architectural Perfmon, speed 2000.26 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit > mask of 0x00 (No unit mask) count 100000 > samples % image name symbol name > *2163868 24.2065 mysqld mutex_delay* > 694540 7.7696 mysqld > build_template(row_prebuilt_struct*, THD*, TABLE*, unsigned int) > 626405 7.0074 mysqld buf_page_get_gen > 607664 6.7978 mysqld rec_get_offsets_func > 480937 5.3801 mysqld cmp_dtuple_rec_with_match > 412081 4.6098 mysqld btr_cur_search_to_nth_level > 365402 4.0876 mysqld rec_init_offsets > 356064 3.9832 mysqld page_cur_search_with_match > 310819 3.4770 mysqld row_search_for_mysql > 274248 3.0679 mysqld row_sel_store_mysql_rec > 208466 2.3320 mysqld my_pthread_fastmutex_lock > 185853 2.0791 mysqld > ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, > ha_rkey_function) > 182939 2.0465 mysqld pfs_mutex_enter_func > 154369 1.7269 mysqld mtr_memo_slot_release > 138558 1.5500 mysqld page_check_dir > 131622 1.4724 mysqld dict_index_copy_types > 101162 1.1317 mysqld srv_conc_force_exit_innodb > 72754 0.8139 mysqld > ha_innobase::change_active_index(unsigned int) > 65191 0.7293 mysqld my_long10_to_str_8bit > 62574 0.7000 mysqld btr_pcur_store_position > 61900 0.6925 mysqld pfs_mutex_exit_func > 51889 0.5805 mysqld Field_long::pack_length() const > 49073 0.5490 mysqld srv_conc_enter_innodb > 44079 0.4931 mysqld > ha_innobase::init_table_handle_for_HANDLER() > 38998 0.4363 mysqld Field_tiny::pack_length() const > 38868 0.4348 mysqld rec_copy_prefix_to_buf > 36292 0.4060 mysqld > ha_innobase::innobase_get_index(unsigned int) > > That mutex_delay is eating quite a big % of the time. > I have not been able to find what is that related to. Does anyone has a > clue about what's it and if there's a way to improve and overcome it? > > Cheers > Manuel. >