On 2012-12-20 15:51:37 +0100, Andres Freund wrote: > On 2012-12-20 15:45:47 +0100, Andres Freund wrote: > > On 2012-12-20 09:11:46 -0500, Robert Haas wrote: > > > On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <si...@2ndquadrant.com> > > > wrote: > > > > On 18 December 2012 22:10, Robert Haas <robertmh...@gmail.com> wrote: > > > >> Well that would be nice, but the problem is that I see no way to > > > >> implement it. If, with a unified parser, the parser is 14% of our > > > >> source code, then splitting it in two will probably crank that number > > > >> up well over 20%, because there will be duplication between the two. > > > >> That seems double-plus un-good. > > > > > > > > I don't think the size of the parser binary is that relevant. What is > > > > relevant is how much of that is regularly accessed. > > > > > > > > Increasing parser cache misses for DDL and increasing size of binary > > > > overall are acceptable costs if we are able to swap out the unneeded > > > > areas and significantly reduce the cache misses on the well travelled > > > > portions of the parser. > > > > > > I generally agree. We don't want to bloat the size of the parser with > > > wild abandon, but yeah if we can reduce the cache misses on the > > > well-travelled portions that seems like it ought to help. My previous > > > hacky attempt to quantify the potential benefit in this area was: > > > > > > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php > > > > > > On my machine there seemed to be a small but consistent win; on a very > > > old box Jeff Janes tried, it didn't seem like there was any benefit at > > > all. Somehow, I have a feeling we're missing a trick here. > > > > I don't think you will see too many cache misses on such a low number of > > extremly simply statements, so its not too surprising not to see a that > > big difference with that. > > > > Are we sure its really cache-misses and not just the actions performed > > in the grammar that make bison code show up in profiles? I remember the > > latter being the case... > > Hm. A very, very quick perf stat -dvvv of pgbench -S -c 20 -j 20 -T 20 later: > > 218350.885559 task-clock # 10.095 CPUs utilized > 1,676,479 context-switches # 0.008 M/sec > 2,396 cpu-migrations # 0.011 K/sec > 796,038 page-faults # 0.004 M/sec > 506,312,525,518 cycles # 2.319 GHz > [20.00%] > 405,944,435,754 stalled-cycles-frontend # 80.18% frontend cycles idle > [30.32%] > 236,712,872,641 stalled-cycles-backend # 46.75% backend cycles idle > [40.51%] > 193,459,120,458 instructions # 0.38 insns per cycle > # 2.10 stalled cycles per > insn [50.70%] > 36,433,144,472 branches # 166.856 M/sec > [51.12%] > 3,623,778,087 branch-misses # 9.95% of all branches > [50.87%] > 50,344,123,581 L1-dcache-loads # 230.565 M/sec > [50.33%] > 5,548,192,686 L1-dcache-load-misses # 11.02% of all L1-dcache hits > [49.69%] > 2,666,461,361 LLC-loads # 12.212 M/sec > [35.63%] > 112,407,198 LLC-load-misses # 4.22% of all LL-cache hits > [ 9.67%] > > 21.629396701 seconds time elapsed > > So there definitely a noticeable rate of cache misses...
L1 misses: # Samples: 997K of event 'L1-dcache-load-misses' # Overhead Command Shared Object Symbol # ........ ........ ............................................................... 6.49% postgres postgres [.] SearchCatCache 3.65% postgres postgres [.] base_yyparse 3.48% postgres postgres [.] hash_search_with_hash_value 3.41% postgres postgres [.] AllocSetAlloc 1.84% postgres postgres [.] LWLockAcquire 1.40% postgres postgres [.] fmgr_info_cxt_security 1.36% postgres postgres [.] nocachegetattr 1.23% postgres libc-2.13.so [.] _int_malloc 1.20% postgres postgres [.] core_yylex 1.15% postgres postgres [.] MemoryContextAllocZeroAligned 0.94% postgres postgres [.] PostgresMain 0.94% postgres postgres [.] MemoryContextAlloc 0.91% postgres libc-2.13.so [.] __memcpy_ssse3_back 0.89% postgres postgres [.] CatalogCacheComputeHashValue 0.86% postgres postgres [.] PinBuffer 0.86% postgres [kernel.kallsyms] [k] __audit_syscall_entry 0.80% postgres libc-2.13.so [.] __strcmp_sse42 0.80% postgres postgres [.] _bt_compare 0.78% postgres postgres [.] FunctionCall2Coll 0.77% postgres libc-2.13.so [.] malloc 0.73% postgres libc-2.13.so [.] __memset_sse2 0.72% postgres postgres [.] GetSnapshotData 0.69% postgres [kernel.kallsyms] [k] fget_light 0.69% postgres postgres [.] DirectFunctionCall1Coll 0.67% postgres postgres [.] hash_search 0.67% postgres libc-2.13.so [.] 0x000000000011a3a5 0.66% postgres postgres [.] pgstat_initstats 0.66% postgres postgres [.] AllocSetFree 0.65% postgres libc-2.13.so [.] __strlen_sse42 0.60% postgres libc-2.13.so [.] _int_free 0.60% postgres [kernel.kallsyms] [k] cpuacct_charge 0.59% postgres postgres [.] heap_getsysattr 0.59% postgres postgres [.] MemoryContextAllocZero 0.58% postgres postgres [.] PopActiveSnapshot 0.53% postgres libc-2.13.so [.] __memcmp_sse4_1 0.51% postgres postgres [.] ReadBuffer_common 0.49% postgres postgres [.] ScanKeywordLookup 0.49% postgres postgres [.] LockAcquireExtended 0.47% postgres [kernel.kallsyms] [k] update_cfs_shares 0.45% postgres postgres [.] SearchCatCacheList 0.45% postgres postgres [.] new_list 0.44% postgres postgres [.] get_relation_info LLC misses: # Samples: 1M of event 'LLC-load-misses' # Event count (approx.): 143379713 # Overhead Command Shared Object Symbol # ........ ........ ............................................................... 25.08% postgres postgres [.] _bt_compare 12.84% postgres postgres [.] PinBuffer 9.18% postgres postgres [.] LWLockAcquire 6.31% postgres postgres [.] GetSnapshotData 6.08% postgres postgres [.] heap_hot_search_buffer 5.13% postgres postgres [.] hash_search_with_hash_value 4.85% postgres postgres [.] _bt_checkpage 3.95% postgres postgres [.] _bt_moveright 3.09% postgres postgres [.] heap_page_prune_opt 2.12% postgres postgres [.] slot_deform_tuple 1.98% postgres postgres [.] LWLockRelease 1.82% postgres libc-2.13.so [.] __memcmp_sse4_1 1.16% postgres postgres [.] ExecProject 1.16% postgres postgres [.] FunctionCall2Coll 0.94% postgres [kernel.kallsyms] [k] copy_user_generic_string 0.94% postgres [kernel.kallsyms] [k] tg_load_down 0.78% postgres [kernel.kallsyms] [k] find_get_page 0.73% postgres postgres [.] ProcArrayEndTransaction 0.73% postgres postgres [.] pfree 0.71% postgres postgres [.] pgstat_report_xact_timestamp 0.69% postgres postgres [.] index_fetch_heap 0.66% postgres postgres [.] LockAcquireExtended 0.56% postgres postgres [.] LockBuffer 0.45% postgres postgres [.] slot_getsomeattrs 0.40% postgres postgres [.] _bt_readpage So it seems L1 misses are the interesting thing wrt to parsing. When doing a source/assembly annotation in SearchCatCache about half of the misses are attributed to the memcpy() directly at the beginning. In base_yyparse the three biggest offenders (15%, 10.5%, 5.58%) really seem to be various kinds of table lookups. So it seems to confirm the various suspicious that the table size might be rather relevant. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers