Re: [HACKERS] Parallel Select query performance and shared buffers

Metin Doslu Wed, 04 Dec 2013 08:44:57 -0800

> I'd strongly suggest doing a "perf record -g -a <wait a bit, ctrl-c>;
> perf report" run to check what's eating up the time.


Here is one example:

+  38.87%   swapper  [kernel.kallsyms]   [k] hypercall_page
+   9.32%  postgres  [kernel.kallsyms]   [k] hypercall_page
+   6.80%  postgres  [kernel.kallsyms]   [k] xen_set_pte_at
+   5.83%  postgres  [kernel.kallsyms]   [k] copy_user_generic_string
+   2.06%  postgres  [kernel.kallsyms]   [k] file_read_actor
+   1.89%  postgres  postgres            [.] heapgettup_pagemode
+   1.83%  postgres  postgres            [.] hash_search_with_hash_value
+   1.33%  postgres  [kernel.kallsyms]   [k] get_phys_to_machine
+   1.25%  postgres  [kernel.kallsyms]   [k] find_get_page
+   1.00%  postgres  postgres            [.] heapgetpage
+   0.99%  postgres  [kernel.kallsyms]   [k] radix_tree_lookup_element
+   0.98%  postgres  postgres            [.] advance_aggregates
+   0.96%  postgres  postgres            [.] ExecProject
+   0.94%  postgres  postgres            [.] advance_transition_function
+   0.88%  postgres  postgres            [.] ExecScan
+   0.87%  postgres  postgres            [.] HeapTupleSatisfiesMVCC
+   0.86%  postgres  postgres            [.] LWLockAcquire
+   0.82%  postgres  [kernel.kallsyms]   [k] put_page
+   0.82%  postgres  postgres            [.] MemoryContextReset
+   0.80%  postgres  postgres            [.] SeqNext
+   0.78%  postgres  [kernel.kallsyms]   [k] pte_mfn_to_pfn
+   0.69%  postgres  postgres            [.] ExecClearTuple
+   0.57%  postgres  postgres            [.] ExecProcNode
+   0.54%  postgres  postgres            [.] heap_getnext
+   0.53%  postgres  postgres            [.] LWLockRelease
+   0.53%  postgres  postgres            [.] ExecStoreTuple
+   0.51%  postgres  libc-2.12.so        [.] __GI___libc_read
+   0.42%  postgres  [kernel.kallsyms]   [k] xen_spin_lock
+   0.40%  postgres  postgres            [.] ReadBuffer_common
+   0.38%  postgres  [kernel.kallsyms]   [k] __do_fault
+   0.37%  postgres  [kernel.kallsyms]   [k] shmem_fault
+   0.37%  postgres  [kernel.kallsyms]   [k] unmap_single_vma
+   0.35%  postgres  [kernel.kallsyms]   [k] __wake_up_bit
+   0.33%  postgres  postgres            [.] StrategyGetBuffer
+   0.33%  postgres  [kernel.kallsyms]   [k] set_page_dirty
+   0.33%  postgres  [kernel.kallsyms]   [k] handle_pte_fault
+   0.33%  postgres  postgres            [.] ExecAgg
+   0.31%  postgres  postgres            [.] XidInMVCCSnapshot
+   0.31%  postgres  [kernel.kallsyms]   [k] __audit_syscall_entry
+   0.31%  postgres  postgres            [.] CheckForSerializableConflictOut
+   0.29%  postgres  [kernel.kallsyms]   [k] handle_mm_fault
+   0.25%  postgres  [kernel.kallsyms]   [k] shmem_getpage_gfp



On Wed, Dec 4, 2013 at 6:33 PM, Andres Freund <[email protected]>wrote:

> On 2013-12-04 14:27:10 -0200, Claudio Freire wrote:
> > On Wed, Dec 4, 2013 at 9:19 AM, Metin Doslu <[email protected]> wrote:
> > >
> > > Here are the results of "vmstat 1" while running 8 parallel TPC-H
> Simple
> > > (#6) queries:  Although there is no need for I/O, "wa" fluctuates
> between 0
> > > and 1.
> > >
> > > procs -----------memory---------- ---swap-- -----io---- --system--
> > > -----cpu-----
> > >  r  b   swpd   free   buff    cache     si   so    bi    bo    in
> cs us sy  id wa st
> > >  0  0      0 30093568  84892 38723896    0    0     0     0    22
> 14  0  0 100  0  0
> > >  8  1      0 30043056  84892 38723896    0    0     0     0 27080
>  52708 16 14  70  0  0
> > >  8  1      0 30006600  84892 38723896    0    0     0     0 44952
> 118286 43 44  12  1  0
> > >  8  0      0 29986264  84900 38723896    0    0     0    20 28043
>  95934 49 42   8  1  0
> > >  7  0      0 29991976  84900 38723896    0    0     0     0  8308
>  73641 52 42   6  0  0
> > >  0  0      0 30091828  84900 38723896    0    0     0     0  3996
>  30978 23 24  53  0  0
> > >  0  0      0 30091968  84900 38723896    0    0     0     0    17
>  23   0  0 100  0  0
> >
> >
> > Notice the huge %sy
>
> My bet is on transparent hugepage defragmentation. Alternatively it's
> scheduler overhead, due to superflous context switches around the buffer
> mapping locks.
>
> I'd strongly suggest doing a "perf record -g -a <wait a bit, ctrl-c>;
> perf report" run to check what's eating up the time.
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Parallel Select query performance and shared buffers

Reply via email to