Hi, I have fallen into this trap and others have too. If you run EXPLAIN(ANALYZE) no de-toasting happens. This makes query-runtimes differ a lot. The bigger point is that the average user expects more from EXPLAIN(ANALYZE) than what it provides. This can be suprising. You can force detoasting during explain with explicit calls to length(), but that is tedious. Those of us who are forced to work using java stacks, orms and still store mostly documents fall into this trap sooner or later. I have already received some good feedback on this one, so this is an issue that bother quite a few people out there.
Attached is a patch for addressing the issue in form of adding another parameter to explain. I don't know if that is a good idea, but I got some feedback that a solution to this problem would be appreciated by some people out there. It would also be nice to reflect the detoasting in the "buffers" option of explain as well. The change for detoasting is only a few lines though. So the idea was to allow this EXPLAIN (ANALYZE, DETOAST) SELECT * FROM sometable; and perform the detoasting step additionally during the explain. This just gives a more realistic runtime and by playing around with the parameter and comparing the execution-times of the query one even gets an impression about the detoasting cost involved in a query. Since the parameter is purely optional, it would not affect any existing measures. It is not uncommon that the runtime of explain-analyze is way unrealistic in the real world, where people use PostgreSQL to store larger and larger documents inside tables and not using Large-Objects. Here is a video of the effect (in an exagerated form): https://www.stepanrutz.com/short_detoast_subtitles.mp4 It would be great to get some feedback on the subject and how to address this, maybe in totally different ways. Greetings from cologne, Stepan Stepan Rutz - IT Consultant, Cologne Germany, stepan.rutz AT gmx.de
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8570b14f62..50859e6a1e 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -47,6 +47,8 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL; /* Hook for plugins to get control in explain_get_index_name() */ explain_get_index_name_hook_type explain_get_index_name_hook = NULL; +/* Forward decl for detoasting analyze */ +static const DestReceiver detoastAnalyzeDR; /* OR-able flags for ExplainXMLTag() */ #define X_OPENING 0 @@ -192,6 +194,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, es->settings = defGetBoolean(opt); else if (strcmp(opt->defname, "generic_plan") == 0) es->generic = defGetBoolean(opt); + else if (strcmp(opt->defname, "detoast") == 0) + es->detoast = defGetBoolean(opt); else if (strcmp(opt->defname, "timing") == 0) { timing_set = true; @@ -244,6 +248,12 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("EXPLAIN option TIMING requires ANALYZE"))); + /* check that detoast is used with EXPLAIN ANALYZE */ + if (es->detoast && !es->analyze) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("EXPLAIN option DETOAST requires ANALYZE"))); + /* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */ if (es->generic && es->analyze) ereport(ERROR, @@ -565,9 +575,13 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, /* * Normally we discard the query's output, but if explaining CREATE TABLE * AS, we'd better use the appropriate tuple receiver. + * Also if we choose to detoast during explain, we need to send the + * tuples to a receiver that performs the detoasting */ if (into) dest = CreateIntoRelDestReceiver(into); + else if (es->analyze && es->detoast) + dest = &detoastAnalyzeDR; else dest = None_Receiver; @@ -5052,3 +5066,98 @@ escape_yaml(StringInfo buf, const char *str) { escape_json(buf, str); } + + +/* ---------------- + * detoasting DestReceiver functions + * ---------------- + */ +/* the idea is that running EXPLAIN (ANALYZE) sometimes gives results that + * are way unrealistic, since during explain there is no detoasting of + * values. That means you get no idea what your actual query time is going + * to be, nor what the actual overhead of detoasting is. + * + * if you run EXPLAIN (ANAYLZE, DETOAST) then you get an impression of + * what is actually happening when you use postgresql as a document-database + * or have large values for some other reason. + */ + +/* this method receives the tuples/records during explain (analyze, detoast) + * and detoasts them. + * this method is otherwise not called and its purpose is to detoast to + * make analyze optionally more realistic + */ +static bool +detoastAnalyzeReceive(TupleTableSlot *slot, DestReceiver *self) +{ + MemoryContext cxt; + MemoryContext oldcxt; + int i; + bool shouldFree = true; + TupleDesc tupdesc = slot->tts_tupleDescriptor; + HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree); + uint16 t_infomask = tuple->t_data->t_infomask; + uint16 t_infomask2 = tuple->t_data->t_infomask2; + bits8 *t_bits = tuple->t_data->t_bits; + + /* memory context for our work */ + cxt = AllocSetContextCreate(CurrentMemoryContext, "DetoastTupleReceive", + ALLOCSET_DEFAULT_SIZES); + oldcxt = MemoryContextSwitchTo(cxt); + + /* fill all of the slot's attributes, we can now use slot->tts_values */ + slot_getallattrs(slot); + + /* iterate over all attributes of the tuple and potentially detoast them */ + for (i = 0; i < tupdesc->natts; i++) + { + Form_pg_attribute attr; + bool is_null; + char storage; + + attr = TupleDescAttr(tupdesc, i); + storage = attr->attstorage; + if (i >= (t_infomask2 & HEAP_NATTS_MASK)) + is_null = true; + else + is_null = (t_infomask & HEAP_HASNULL) && att_isnull(i, t_bits); + if (!is_null && (storage == 'e' || storage == 'x' || storage == 'm')) { + Datum datum = slot->tts_values[i]; + /* here simple detoast the attribute's value, making + * the explain (analyze) slower, but also more realistic + */ + pg_detoast_datum_packed(datum); + } + } + if (shouldFree) + pfree(tuple); + + /* free our memorycontext and return to previous */ + MemoryContextSwitchTo(oldcxt); + MemoryContextDelete(cxt); + return true; +} + +static void +detoastAnalyzeStartup(DestReceiver *self, int operation, TupleDesc typeinfo) +{ + /* nothing required */ +} + +static void +detoastAnalyzeCleanup(DestReceiver *self) +{ + /* nothing required */ +} + +static void +detoastAnalyzeDestroy(DestReceiver *self) +{ + /* nothing required */ +} + + +static const DestReceiver detoastAnalyzeDR = { + detoastAnalyzeReceive, detoastAnalyzeStartup, detoastAnalyzeCleanup, detoastAnalyzeDestroy, + DestNone +}; diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index 3d3e632a0c..011dffb890 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -47,6 +47,8 @@ typedef struct ExplainState bool summary; /* print total planning and execution timing */ bool settings; /* print modified settings */ bool generic; /* generate a generic plan */ + bool detoast; /* detoast all toasted values, only applicable + when "analyze" is active. */ ExplainFormat format; /* output format */ /* state for output formatting --- not reset for each new plan tree */ int indent; /* current indentation level */ diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out index 1aca77491b..eef68d1eca 100644 --- a/src/test/regress/expected/explain.out +++ b/src/test/regress/expected/explain.out @@ -559,3 +559,23 @@ select explain_filter('explain (verbose) select * from int8_tbl i8'); Query Identifier: N (3 rows) +-- Test detoast is accepted as a parameter to explain +-- seems hard to test the expected longer runtime of detoasting +-- since one needs to setup enough data to make such a test stable +-- also timing are filtered out by explain_filter anyway +create table test_detoast(id bigserial, val text); +select explain_filter('explain (analyze,detoast) select * from test_detoast'); + explain_filter +------------------------------------------------------------------------------------------------ + Seq Scan on test_detoast (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) + Planning Time: N.N ms + Execution Time: N.N ms +(3 rows) + +drop table test_detoast; +-- Test detoast is _not_ accepted as a parameter to explain unless analyze is specified +create table test_detoast(id bigserial, val text); +select explain_filter('explain (detoast) select * from test_detoast'); +ERROR: EXPLAIN option DETOAST requires ANALYZE +CONTEXT: PL/pgSQL function explain_filter(text) line 5 at FOR over EXECUTE statement +drop table test_detoast; diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql index b6b7beab27..43047f9cce 100644 --- a/src/test/regress/sql/explain.sql +++ b/src/test/regress/sql/explain.sql @@ -154,3 +154,18 @@ select explain_filter('explain (verbose) select * from t1 where pg_temp.mysin(f1 -- Test compute_query_id set compute_query_id = on; select explain_filter('explain (verbose) select * from int8_tbl i8'); + +-- Test detoast is accepted as a parameter to explain +-- seems hard to test the expected longer runtime of detoasting +-- since one needs to setup enough data to make such a test stable +-- also timing are filtered out by explain_filter anyway +create table test_detoast(id bigserial, val text); +select explain_filter('explain (analyze,detoast) select * from test_detoast'); +drop table test_detoast; + +-- Test detoast is _not_ accepted as a parameter to explain unless analyze is specified +create table test_detoast(id bigserial, val text); +select explain_filter('explain (detoast) select * from test_detoast'); +drop table test_detoast; + +