Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-23 Thread Jeff Janes
On Thursday, December 20, 2012, Charles Gomes wrote:

> Without hyperthreading CPU still not a bottleneck, while I/O is only 10%
> utilization.
>
> top - 14:55:01 up 27 min,  2 users,  load average: 0.17, 0.19, 0.14
> Tasks: 614 total,  17 running, 597 sleeping,   0 stopped,   0 zombie
> Cpu(s): 73.8%us,  4.3%sy,  0.0%ni, 21.6%id,  0.1%wa,  0.0%hi,  0.1%si,
> 0.0%st
>


...


> I believe the bottleneck may be that pgsql has fight with it's siblings to
> update the indexes.


I think that should mostly show up as idle or wait, not as user time.
 Since your at 75% user already, you couldn't expect more than a 33%
improvement by fixing that, assuming that that were the problem.


> Is there a way good way to add probes to check where things are slowing
down ?

What happens if you just drop the indexes as an experiment?  That should
put a hard limit on the amount the indexes can be slowing you down.

I like oprofile to give you good bottom level profile of where CPU time is
going.  Unfortunately, it will probably just show you "AllocSetAlloc".
 Also, I don't trust it on virtualized systems, if you are on one of those.

There are many other ways of approaching it, but mostly you have to already
have a good theory about what is going on in order know which one to use or
to interpret the results, and many of them require you to make custom
compiles of the postgres server code.


Cheers,

Jeff


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-23 Thread Jeff Janes
On Thursday, December 20, 2012, Charles Gomes wrote:

> True, that's the same I feel, I will be looking to translate the trigger
> to C if I can find good examples, that should accelerate.
>

I think your performance bottleneck is almost certainly the dynamic SQL.
 Using C to generate that dynamic SQL isn't going to help much, because it
is still the SQL engine that has to parse, plan, and execute it.

Are the vast majority if your inserts done on any given day for records
from that same day or the one before; or are they evenly spread over the
preceding year?  If the former, you could use static SQL in IF and ELSIF
for those days, and fall back on the dynamic SQL for the exceptions in the
ELSE block.  Of course that means you have to update the trigger every day.



> Using rules would be totally bad as I'm partitioning daily and after one
> year having 365 lines of IF won't be fun to maintain.
>

Maintaining 365 lines of IF is what Perl was invented for.  That goes for
triggers w/ static SQL as well as for rules.

If you do the static SQL in a trigger and the dates of the records are
evenly scattered over the preceding year, make sure your IFs are nested
like a binary search, not a linear search.  And if they are mostly for
"today's" date, then make sure you search backwards.

Cheers,

Jeff


Re: [PERFORM] Why is PostgreSQL 9.2 slower than 9.1 in my tests?

2012-12-23 Thread Jeff Janes
On Tuesday, December 11, 2012, Tom Lane wrote:

> Jeff Janes > writes:
> > On Tue, Dec 11, 2012 at 2:50 AM, Patryk Sidzina
> > > wrote:
> >> The differences come up when you change the "INSERT" to "EXECUTE
> 'INSERT'" (
> >> and i checked this time on 3 machines, one of which was Windows):
>
> >> FOR i IN 1..cnt LOOP
> >> EXECUTE 'INSERT INTO test_table_md_speed(n) VALUES (' || i || ')';
> >> END LOOP;
>
> > The culprit is the commit below.  I don't know exactly why this slows
> > down your case.  A preliminary oprofile analysis suggests that it most
> > of the slowdown is that it calls AllocSetAlloc more often.  I suspect
> > that this slow-down will be considered acceptable trade-off for
> > getting good parameterized plans.
>
> I'm having a hard time getting excited about optimizing the above case:
> the user can do far more to make it fast than we can, simply by not
> using EXECUTE, which is utterly unnecessary in this example.
>

I assumed his example was an intentionally simplified test-case, not a real
world use-case.

For a more realistic use, see "[PERFORM] Performance on Bulk Insert to
Partitioned Table".  There too it would probably be best to get rid of the
EXECUTE, but doing so in that case would certainly have a high cost in
trigger-code complexity and maintainability.  (In my test case of loading
1e7 narrow tuples to 100 partitions, the plan cache change lead to a 26%
slow down)



> Having said that, though, it's not real clear to me why the plancache
> changes would have affected the speed of EXECUTE at all --- the whole
> point of that command is we don't cache a plan for the query.
>


Doing a bottom level profile isn't helpful because all of the extra time is
in very low level code that is called from everywhere.  Doing call-counts
with gprof, I see that there is big increase in the calls to copyObject
(which indirectly leads to a big increase in AllocSetAlloc).  Before the
change, each EXECUTE had one top-level (i.e. nonrecursive) copyObject call,
coming from _SPI_prepare_plan.

After the change, each EXECUTE has 4 such top-level copyObject calls, one
each from CreateCachedPlan and CompleteCachedPlan and two
from BuildCachedPlan.

Cheers,

Jeff