Re: [HACKERS] Runtime Partition Pruning
Hello Robert, On Sat, Dec 2, 2017 at 12:34 AM, Robert Haas wrote: > On Fri, Dec 1, 2017 at 6:20 AM, Beena Emerson wrote: >> David Q1: >> postgres=# explain analyse execute ab_q1 (3,3); --const >>QUERY PLAN >> - >> Append (cost=0.00..43.90 rows=1 width=8) (actual time=0.006..0.006 >> rows=0 loops=1) >>-> Seq Scan on ab_a3_b3 (cost=0.00..43.90 rows=1 width=8) (actual >> time=0.005..0.005 rows=0 loops=1) >> Filter: ((a = 3) AND (b = 3)) >> Planning time: 0.588 ms >> Execution time: 0.043 ms >> (5 rows) > > I think the EXPLAIN ANALYZE input should show something attached to > the Append node so that we can tell that partition pruning is in use. > I'm not sure if that is as simple as "Run-Time Partition Pruning: Yes" > or if we can give a few more useful details. > The output above is shown for a Const Value i.e. optimizer pruning which I included just to show that the correct partition is chosen even during runtime pruning for the given value. So taking your suggestion, the output for runtime pruning could be something as follows: postgres=# explain analyse execute ab_q1 (3,3); QUERY PLAN - Append (cost=0.00..395.10 rows=9 width=8) (actual time=0.119..0.119 rows=0 loops=1) (run-time partition pruning: on) -> Seq Scan on ab_a1_b1 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a1_b2 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a1_b3 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a2_b1 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a2_b2 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a2_b3 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a3_b1 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a3_b2 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a3_b3 (cost=0.00..43.90 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1) Filter: ((a = $1) AND (b = $2)) Planning time: 0.828 ms Execution time: 0.234 ms (21 rows) Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com
On Wed, Nov 29, 2017 at 2:04 PM, Amit Kapila wrote: > On Tue, Nov 28, 2017 at 9:42 PM, Robert Haas wrote: >> On Tue, Nov 28, 2017 at 2:23 AM, Amit Kapila wrote: >>> That is wrong and I think you have hit a bug. It should be 2974 * 5 = >>> 14870 as you have seen in other cases. The problem is that during >>> rescan, we generally reinitialize the required state, but we forgot to >>> reinitialize the instrumentation related memory which is used in the >>> accumulation of stats, so changing that would fix some part of this >>> problem which is that at Parallel node, you won't see wrong values. >>> However, we also need to ensure that the per-worker details also get >>> accumulated across rescans. Attached patch should fix the problem you >>> are seeing. I think this needs some more analysis and testing to see >>> if everything works in the desired way. >>> >>> Is it possible for you to test the attached patch and see if you are >>> still seeing any unexpected values? >> >> FWIW, this looks sensible to me. Not sure if there's any good way to >> write a regression test for it. >> > > I think so, but not 100% sure. I will give it a try and report back. > Attached patch contains regression test as well. Note that I have carefully disabled all variable stats by using (analyze, timing off, summary off, costs off) and then selected parallel sequential scan on the right of join so that we have nloops and rows as variable stats and those should remain constant. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_accum_instr_parallel_workers_v2.patch Description: Binary data
Re: [HACKERS] Custom compression methods
On 12/01/2017 10:52 PM, Andres Freund wrote: > On 2017-12-01 16:14:58 -0500, Robert Haas wrote: >> Honestly, if we can give everybody a 4% space reduction by >> switching to lz4, I think that's totally worth doing -- but let's >> not make people choose it, let's make it the default going forward, >> and keep pglz support around so we don't break pg_upgrade >> compatibility (and so people can continue to choose it if for some >> reason it works better in their use case). That kind of improvement >> is nothing special in a specific workload, but TOAST is a pretty >> general-purpose mechanism. I have become, through a few bitter >> experiences, a strong believer in the value of trying to reduce our >> on-disk footprint, and knocking 4% off the size of every TOAST >> table in the world does not sound worthless to me -- even though >> context-aware compression can doubtless do a lot better. > > +1. It's also a lot faster, and I've seen way way to many workloads > with 50%+ time spent in pglz. > TBH the 4% figure is something I mostly made up (I'm fake news!). On the mailing list archive (which I believe is pretty compressible) I observed something like 2.5% size reduction with lz4 compared to pglz, at least with the compression levels I've used ... Other algorithms (e.g. zstd) got significantly better compression (25%) compared to pglz, but in exchange for longer compression. I'm sure we could lower compression level to make it faster, but that will of course hurt the compression ratio. I don't think switching to a different compression algorithm is a way forward - it was proposed and explored repeatedly in the past, and every time it failed for a number of reasons, most of which are still valid. Firstly, it's going to be quite hard (or perhaps impossible) to find an algorithm that is "universally better" than pglz. Some algorithms do work better for text documents, some for binary blobs, etc. I don't think there's a win-win option. Sure, there are workloads where pglz performs poorly (I've seen such cases too), but IMHO that's more an argument for the custom compression method approach. pglz gives you good default compression in most cases, and you can change it for columns where it matters, and where a different space/time trade-off makes sense. Secondly, all the previous attempts ran into some legal issues, i.e. licensing and/or patents. Maybe the situation changed since then (no idea, haven't looked into that), but in the past the "pluggable" approach was proposed as a way to address this. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Transform for pl/perl
On 12/01/2017 11:37 AM, Robert Haas wrote: > On Fri, Dec 1, 2017 at 12:30 AM, Michael Paquier > wrote: >> On Tue, Nov 28, 2017 at 5:14 PM, Aleksander Alekseev >> wrote: >>> The new status of this patch is: Ready for Committer >> Patch moved to CF 2018-01. Perhaps a committer will look at it at some point. > FWIW, although I like the idea, I'm not going to work on the patch. I > like Perl, but I neither know its internals nor use plperl. I hope > someone else will be interested. > I will probably pick it up fairly shortly. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: documentation is now XML
Hello, 28.11.2017 20:06, Tom Lane wrote: Peter Eisentraut writes: On 11/23/17 15:39, Tom Lane wrote: I think we should have a discussion about whether it'd be smart to convert the back branches' documentation to XML as well. My short answer to that is, I don't have time for that. I don't know if anyone else wants to investigate it. But it took us years to get to this point, and backpatching and back-testing all of that is just a lot of work that was not planned. I thought that might be your answer :-(. I can't argue with it --- if it's not a simple thing to back-patch, then it's unclear whether the net annoyance over the next five years would be enough to justify doing the work. I can prepare such patches (scripts to generate them). In fact we (Postgres Pro) perform such conversion (SGML->XML) on-fly when building docs starting from 9.6. So it's not problem to convert *.sgml and replace Makefile and *.xsl. But I would prefer to perform the conversion when we finish the move on 11devel (renaming sgml to xml, maybe optimizing xsl's...). Best regards, -- Alexander Lakhin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Custom compression methods
On Dec 2, 2017, at 6:04 PM, Tomas Vondra wrote: > On 12/01/2017 10:52 PM, Andres Freund wrote: >> On 2017-12-01 16:14:58 -0500, Robert Haas wrote: >>> Honestly, if we can give everybody a 4% space reduction by >>> switching to lz4, I think that's totally worth doing -- but let's >>> not make people choose it, let's make it the default going forward, >>> and keep pglz support around so we don't break pg_upgrade >>> compatibility (and so people can continue to choose it if for some >>> reason it works better in their use case). That kind of improvement >>> is nothing special in a specific workload, but TOAST is a pretty >>> general-purpose mechanism. I have become, through a few bitter >>> experiences, a strong believer in the value of trying to reduce our >>> on-disk footprint, and knocking 4% off the size of every TOAST >>> table in the world does not sound worthless to me -- even though >>> context-aware compression can doubtless do a lot better. >> >> +1. It's also a lot faster, and I've seen way way to many workloads >> with 50%+ time spent in pglz. >> > > TBH the 4% figure is something I mostly made up (I'm fake news!). On the > mailing list archive (which I believe is pretty compressible) I observed > something like 2.5% size reduction with lz4 compared to pglz, at least > with the compression levels I've used ... > > Other algorithms (e.g. zstd) got significantly better compression (25%) > compared to pglz, but in exchange for longer compression. I'm sure we > could lower compression level to make it faster, but that will of course > hurt the compression ratio. > > I don't think switching to a different compression algorithm is a way > forward - it was proposed and explored repeatedly in the past, and every > time it failed for a number of reasons, most of which are still valid. > > > Firstly, it's going to be quite hard (or perhaps impossible) to find an > algorithm that is "universally better" than pglz. Some algorithms do > work better for text documents, some for binary blobs, etc. I don't > think there's a win-win option. > > Sure, there are workloads where pglz performs poorly (I've seen such > cases too), but IMHO that's more an argument for the custom compression > method approach. pglz gives you good default compression in most cases, > and you can change it for columns where it matters, and where a > different space/time trade-off makes sense. > > > Secondly, all the previous attempts ran into some legal issues, i.e. > licensing and/or patents. Maybe the situation changed since then (no > idea, haven't looked into that), but in the past the "pluggable" > approach was proposed as a way to address this. > > May be it will be interesting for you to see the following results of applying page-level compression (CFS in PgPro-EE) to pgbench data: Configuration Size (Gb) Time (sec) vanilla postgres 15.31 92 zlib (default level) 2.37 284 zlib (best speed) 2.43 191 postgres internal lz 3.89 214 lz4 4.12 95 snappy (google) 5.18 99 lzfse (apple) 2.80 1099 zstd (facebook) 1.69 125 All algorithms (except zlib) were used with best-speed option: using better compression level usually has not so large impact on compression ratio (<30%), but can significantly increase time (several times). Certainly pgbench isnot the best candidate for testing compression algorithms: it generates a lot of artificial and redundant data. But we measured it also on real customers data and still zstd seems to be the best compression methods: provides good compression with smallest CPU overhead.
Re: [HACKERS] Custom compression methods
Hi, On 2017-12-02 16:04:52 +0100, Tomas Vondra wrote: > Firstly, it's going to be quite hard (or perhaps impossible) to find an > algorithm that is "universally better" than pglz. Some algorithms do > work better for text documents, some for binary blobs, etc. I don't > think there's a win-win option. lz4 is pretty much there. > Secondly, all the previous attempts ran into some legal issues, i.e. > licensing and/or patents. Maybe the situation changed since then (no > idea, haven't looked into that), but in the past the "pluggable" > approach was proposed as a way to address this. Those were pretty bogus. I think we're not doing our users a favor if they've to download some external projects, then fiddle with things, just to not choose a compression algorithm that's been known bad for at least 5+ years. If we've a decent algorithm in-core *and* then allow extensibility, that's one thing, but keeping the bad and tell forks "please take our users with this code we give you" is ... Greetings, Andres Freund
Partition pruning for Star Schema
Hello, I have a typical star schema, having dimension tables "product", "calendar" and "country" and a fact table "sales". This fact table is partitionned by time (range by month) and country (list). Will query like: select product.name, calendar.month, sum(sales.net_price) from sales inner join product on (product.id = sales.cust_id) inner join country on (country.id = sales.country_id) inner join calendar on (calendar.id = sales.calendar_id) where country.name = 'HERE' and calendar.year = '2017' group by product.name,calendar.month be able to identify needed partitions ? nb: the query has predicates on dimension tables not on columns used for fact table partitioning: - country.name vs sales.country_id, - calendar.year vs sales.calendar_id. Will this be part of postgreSQL 11 ? Thanks in advance Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: [HACKERS] Parallel Hash take II
On 2017-12-02 15:54:29 +1300, Thomas Munro wrote: > On Sat, Dec 2, 2017 at 1:55 PM, Andres Freund wrote: > > - Right now RemovePgTempFilesInDir() will recurse into appropriately > > named directories, and when it recurses it doesn't require the same > > name pattern checks. I think that's good, but I think it'd be prudent > > to be a bit more paranoid and prevent recursing into symlinked > > subdirectories. > > That's why it uses lstat(), so that it sees symlinks rather than what > they point to. It only recurses if S_ISDIR(), and it unlinks anything > else. Right. I'd somehow confused myself by thinking one'd need an explicit S_ISLINK check... > Just a reminder: a couple of problems have come up recently in the > Parallel Hash Join patch itself, so please don't consider that one > ready for commit quite yet. They are: (1) Handling the case where > there is no DSA area because we're running a parallel-aware plan in > non-parallel mode due to lack of resources; (2) Investigating a rare > assertion failure. For (1), that may depend on another patch that > I'll post shortly to kill "es_query_dsa" and, come to think of it, for > (2) it's possible that the problem is in either one of the remaining > patches -- SharedTuplestore or Parallel Hash Join -- so please hold > off on committing either of those until I've got to the bottom of > that. I'm a bit tempted to press ahead regardless of these issues. With your consent obviously. ISTM we're pretty close to the point where this needs to be exposed more widely and that'll surely bring more issues to light. Greetings, Andres Freund
Re: Partition pruning for Star Schema
Hi, On 2017-12-02 13:56:51 -0700, legrand legrand wrote: > I have a typical star schema, having dimension tables "product", "calendar" > and "country" and a fact table "sales". > This fact table is partitionned by time (range by month) and country > (list). You've posted nearly the same a few days ago: http://archives.postgresql.org/message-id/1511986639005-0.post%40n3.nabble.com Please don't just duplicate threads just because you've not received an answer a few days ago. If you'd done additional research / provided more context *and* linked to the last time you sent something it'd be different. Greetings, Andres Freund
Re: [HACKERS] Custom compression methods
On 12/02/2017 09:24 PM, konstantin knizhnik wrote: > > On Dec 2, 2017, at 6:04 PM, Tomas Vondra wrote: > >> On 12/01/2017 10:52 PM, Andres Freund wrote: >> ... >> >> Other algorithms (e.g. zstd) got significantly better compression (25%) >> compared to pglz, but in exchange for longer compression. I'm sure we >> could lower compression level to make it faster, but that will of course >> hurt the compression ratio. >> >> I don't think switching to a different compression algorithm is a way >> forward - it was proposed and explored repeatedly in the past, and every >> time it failed for a number of reasons, most of which are still valid. >> >> >> Firstly, it's going to be quite hard (or perhaps impossible) to >> find an algorithm that is "universally better" than pglz. Some >> algorithms do work better for text documents, some for binary >> blobs, etc. I don't think there's a win-win option. >> >> Sure, there are workloads where pglz performs poorly (I've seen >> such cases too), but IMHO that's more an argument for the custom >> compression method approach. pglz gives you good default >> compression in most cases, and you can change it for columns where >> it matters, and where a different space/time trade-off makes >> sense. >> >> >> Secondly, all the previous attempts ran into some legal issues, i.e. >> licensing and/or patents. Maybe the situation changed since then (no >> idea, haven't looked into that), but in the past the "pluggable" >> approach was proposed as a way to address this. >> >> > > May be it will be interesting for you to see the following results > of applying page-level compression (CFS in PgPro-EE) to pgbench > data: > I don't follow. If I understand what CFS does correctly (and I'm mostly guessing here, because I haven't seen the code published anywhere, and I assume it's proprietary), it essentially compresses whole 8kB blocks. I don't know it reorganizes the data into columnar format first, in some way (to make it more "columnar" which is more compressible), which would make somewhat similar to page-level compression in Oracle. But it's clearly a very different approach from what the patch aims to improve (compressing individual varlena values). > > All algorithms (except zlib) were used with best-speed option: using > better compression level usually has not so large impact on > compression ratio (<30%), but can significantly increase time > (several times). Certainly pgbench isnot the best candidate for > testing compression algorithms: it generates a lot of artificial and > redundant data. But we measured it also on real customers data and > still zstd seems to be the best compression methods: provides good > compression with smallest CPU overhead. > I think this really depends on the dataset, and drawing conclusions based on a single test is somewhat crazy. Especially when it's synthetic pgbench data with lots of inherent redundancy - sequential IDs, ... My takeaway from the results is rather that page-level compression may be very beneficial in some cases, although I wonder how much of that can be gained by simply using compressed filesystem (thus making it transparent to PostgreSQL). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Partition pruning for Star Schema
Sorry, I apologize. I though (most) Hackers were not reading General list. Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: [HACKERS] Custom compression methods
On 12/02/2017 09:38 PM, Andres Freund wrote: > Hi, > > On 2017-12-02 16:04:52 +0100, Tomas Vondra wrote: >> Firstly, it's going to be quite hard (or perhaps impossible) to find an >> algorithm that is "universally better" than pglz. Some algorithms do >> work better for text documents, some for binary blobs, etc. I don't >> think there's a win-win option. > > lz4 is pretty much there. > That's a matter of opinion, I guess. It's a solid compression algorithm, that's for sure ... >> Secondly, all the previous attempts ran into some legal issues, i.e. >> licensing and/or patents. Maybe the situation changed since then (no >> idea, haven't looked into that), but in the past the "pluggable" >> approach was proposed as a way to address this. > > Those were pretty bogus. IANAL so I don't dare to judge on bogusness of such claims. I assume if we made it optional (e.g. configure/initdb option, it'd be much less of an issue). Of course, that has disadvantages too (because when you compile/init with one algorithm, and then find something else would work better for your data, you have to start from scratch). > > I think we're not doing our users a favor if they've to download > some external projects, then fiddle with things, just to not choose > a compression algorithm that's been known bad for at least 5+ years. > If we've a decent algorithm in-core *and* then allow extensibility, > that's one thing, but keeping the bad and tell forks "please take > our users with this code we give you" is ... > I don't understand what exactly is your issue with external projects, TBH. I think extensibility is one of the great strengths of Postgres. It's not all rainbows and unicorns, of course, and it has costs too. FWIW I don't think pglz is a "known bad" algorithm. Perhaps there are cases where other algorithms (e.g. lz4) are running circles around it, particularly when it comes to decompression speed, but I wouldn't say it's "known bad". Not sure which forks you're talking about ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Proposal: Local indexes for partitioned table
David Rowley wrote: > So, then this patch is only really intended as a syntax shortcut for > mass adding of indexes to each partition? This patch is intended to serve as a basis on which to construct further features, just like every other patch we apply. > I feel like we could do better here with little extra effort. The > DETACH index feature does not really seem required for this patch. I > think just ensuring a matching index exists on each leaf partition and > creating any which don't exist before creating the index on the target > partitioned table seems like the correct solution. That way we can > make that index indisvalid flag have a valid meaning all the time. > Some later patch can invent some way to replace a bloated index. What you're saying is that I've written code for A+B, and you're "interested in C (which is incompatible with B), so can we have A+C and drop B". But in reality, there exists (unwritten) D that solves the incompatiblity between B and C. I'm just saying it's essentially the same to postpone C+D than to postpone B+D, and I already have B written; plus that way we don't have to come up with some novel way to handle pg_dump support. So can we get A+B committed and discuss C+D later? A = partitioned indexes B = pg_dump support based on ATTACH C = your proposed planner stuff D = correct indisvalid setting for partitioned indexes (set to false when a partition does not contain the index) The patch in this thread is A+B. > Perhaps later we can invent some generic way to replace a physical > leaf index for a given partitioned index perhaps with the same patch > that might allow us to replace an index which is used by a constraint, > which to me seems like a feature we should have had years ago. This is a hypothetical feature E which would be nice (for partitioned indexes and for ordinary indexes too) but is not strictly necessary. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Custom compression methods
Tomas Vondra wrote: > On 12/01/2017 08:48 PM, Alvaro Herrera wrote: > > Maybe our dependency code needs to be extended in order to support this. > > I think the current logic would drop the column if you were to do "DROP > > COMPRESSION .. CASCADE", but I'm not sure we'd see that as a feature. > > I'd rather have DROP COMPRESSION always fail instead until no columns > > use it. Let's hear other's opinions on this bit though. > > Why should this behave differently compared to data types? Seems quite > against POLA, if you ask me ... OK, DROP TYPE sounds good enough precedent, so +1 on that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re:Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
At 2017-12-01 12:27:09, "Michael Paquier" wrote: >On Tue, Oct 3, 2017 at 1:20 AM, chenhj wrote: >> I had filled the authors field of this patch in commitfest, and will rebase >> this patch if needed. Thank you for your help! > >The documentation of the patch needs a rebase, so I am moving it to >next CF with "waiting on author" as status. > >$ git diff master --check >src/bin/pg_rewind/pg_rewind.c:292: trailing whitespace. >+* >There are whitespace complains. >-- >Michael Rebased and removed the whitespace. regards Chen Huajun pg_rewind_wal_copy_reduce_v9.patch Description: Binary data
pg_dumpall -r -c try to drop user postgres
Hi I am not sure if user postgres should be removed, so it is probably bug pg_dumpall -r -c | grep postgres DROP ROLE postgres; CREATE ROLE postgres; Regards Pavel