Re: [HACKERS] review: pgbench - aggregation of info written into log

Tomas Vondra Sun, 11 Nov 2012 15:35:12 -0800

Hi,

attached is a v4 of the patch. There are not many changes, mostly some
simple tidying up, except for handling the Windows.

After a bit more investigation, it seems to me that we can't really get
the same behavior as in other systems - basically the timestamp is
unavailable so we can't log the interval start. So it seems to me the
best we can do is to disable this option on Windows (and this is done in
the patch). So when you try to use "--aggregate-interval" on Win it will
complain it's an unknown option.

Now that I think of it, there might be a better solution that would not
mimic the Linux/Unix behaviour exactly - we do have elapsed time since
the start of the benchmark, so maybe we should use this instead of the
timestamp? I mean on systems with reasonable gettimeofday we'd get

1345828501 5601 1542744 483552416 61 2573
1345828503 7884 1979812 565806736 60 1479
1345828505 7208 1979422 567277552 59 1391
1345828507 7685 1980268 569784714 60 1398
1345828509 7073 1979779 573489941 236 1411

but on Windows we'd get

0 5601 1542744 483552416 61 2573
2 7884 1979812 565806736 60 1479
4 7208 1979422 567277552 59 1391
6 7685 1980268 569784714 60 1398
8 7073 1979779 573489941 236 1411

i.e. the first column is "seconds since start of the test". Hmmmm, and
maybe we could call 'gettimeofday' at the beginning, to get the
timestamp of the test start (AFAIK the issue on Windows is the
resolution, but it should be enough). Then we could add it up with the
elapsed time and we'd get the same output as on Linux.

And maybe this could be done in regular pgbench execution too? But I
really need help from someone who knows Windows and can test this.

Comments regarding Pavel's reviews are below:

On 2.10.2012 20:08, Pavel Stehule wrote:
> Hello
> 
> I did review of pgbench patch
> http://archives.postgresql.org/pgsql-hackers/2012-09/msg00737.php
> 
> * this patch is cleanly patched
> 
> * I had a problem with doc
> 
> make[1]: Entering directory `/home/pavel/src/postgresql/doc/src/sgml'
> openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
> -c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl
> -t sgml -i output-html -V html-index postgres.sgml
> openjade:pgbench.sgml:767:8:E: document type does not allow element
> "TITLE" here; missing one of "ABSTRACT", "AUTHORBLURB", "MSGSET",
> "CALLOUTLIST", "ITEMIZEDLIST", "ORDEREDLIST", "SEGMENTEDLIST",
> "VARIABLELIST", "CAUTION", "IMPORTANT", "NOTE", "TIP", "WARNING",
> "FORMALPARA", "BLOCKQUOTE", "EQUATION", "EXAMPLE", "FIGURE", "TABLE",
> "PROCEDURE", "SIDEBAR", "QANDASET", "REFSECT3" start-tag
> make[1]: *** [HTML.index] Error 1
> make[1]: *** Deleting file `HTML.index'
> make[1]: Leaving directory `/home/pavel/src/postgresql/doc/src/sgml

Hmmm, I've never got the docs to build at all, all I do get is a heap of
some SGML/DocBook related issues. Can you investigate a bit more where's
the issue? I don't remember messing with the docs in a way that might
cause this ... mostly copy'n'paste edits.

> * pgbench is compiled without warnings
> 
> * there is a few issues in source code
> 
> +                     
> +                     /* should we aggregate the results or not? */
> +                     if (use_log_agg)
> +                     {
> +                             
> +                             /* are we still in the same interval? if yes, 
> accumulate the
> +                              * values (print them otherwise) */
> +                             if (agg->start_time + use_log_agg >= 
> INSTR_TIME_GET_DOUBLE(now))
> +                             {
> +                                     

Errr, so what are the issues here?

> 
> * a real time range for aggregation is dynamic (pgbench is not
> realtime application), so I am not sure if following constraint has
> sense
> 
>  +    if ((duration > 0) && (use_log_agg > 0) && (duration % use_log_agg != 
> 0)) {
> +             fprintf(stderr, "duration (%d) must be a multiple of aggregation
> interval (%d)\n", duration, use_log_agg);
> +             exit(1);
> +     }

I'm not sure what might be the issue here either. If the test duration
is set (using "-T" option), then this kicks in and says that the
duration should be a multiple of aggregation interval. Seems like a
sensible assumption to me. Or do you think this is check should be removed?

> * it doesn't flush last aggregated data (it is mentioned by Tomas:
> "Also, I've realized the last interval may not be logged at all - I'll
> take a look into this in the next version of the patch."). And it can
> be significant for higher values of "use_log_agg"

Yes, and I'm still not sure how to fix this in practice. But I do have
this on my TODO.

> 
> * a name of variable "use_log_agg" is not good - maybe "log_agg_interval" ?

I've changed it to agg_interval.

regards
Tomas

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 090c210..2093edc 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -150,6 +150,7 @@ char           *index_tablespace = NULL;
 #define naccounts      100000
 
 bool           use_log;                        /* log transaction latencies to 
a file */
+int                    agg_interval;           /* log aggregates instead of 
individual transactions */
 bool           is_connect;                     /* establish connection for 
each transaction */
 bool           is_latencies;           /* report per-command latencies */
 int                    main_pid;                       /* main process id used 
in log filename */
@@ -245,6 +246,18 @@ typedef struct
        char       *argv[MAX_ARGS]; /* command word list */
 } Command;
 
+typedef struct
+{
+
+       long    start_time;                     /* when does the interval start 
*/
+       int     cnt;                            /* number of transactions */
+       double  min_duration;           /* min/max durations */
+       double  max_duration;
+       double  sum;                            /* sum(duration), 
sum(duration^2) - for estimates */
+       double  sum2;
+       
+} AggVals;
+
 static Command **sql_files[MAX_FILES]; /* SQL script files */
 static int     num_files;                      /* number of script files */
 static int     num_commands = 0;       /* total number of Command structs */
@@ -377,6 +390,10 @@ usage(void)
                   "  -l           write transaction times to log file\n"
                   "  --sampling-rate NUM\n"
                   "               fraction of transactions to log (e.g. 0.01 
for 1%% sample)\n"
+#ifndef WIN32
+                  "  --aggregate-interval NUM\n"
+                  "               aggregate data over NUM seconds\n"
+#endif
                   "  -M simple|extended|prepared\n"
                   "               protocol for submitting queries to server 
(default: simple)\n"
                   "  -n           do not run VACUUM before tests\n"
@@ -830,9 +847,22 @@ clientDone(CState *st, bool ok)
        return false;                           /* always false */
 }
 
+static
+void agg_vals_init(AggVals * aggs, instr_time start)
+{
+       aggs->cnt = 0;
+       aggs->sum = 0;
+       aggs->sum2 = 0;
+       
+       aggs->min_duration = 3600 * 1000000.0; /* one hour */
+       aggs->max_duration = 0;
+
+       aggs->start_time = INSTR_TIME_GET_DOUBLE(start);
+}
+
 /* return false iff client should be disconnected */
 static bool
-doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile)
+doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, 
AggVals * agg)
 {
        PGresult   *res;
        Command   **commands;
@@ -889,7 +919,7 @@ top:
                        instr_time      now;
                        instr_time      diff;
                        double          usec;
-
+                       
                        /*
                         * write the log entry if this row belongs to the 
random sample,
                         * or no sampling rate was given which means log 
everything.
@@ -902,17 +932,64 @@ top:
                                diff = now;
                                INSTR_TIME_SUBTRACT(diff, st->txn_begin);
                                usec = (double) INSTR_TIME_GET_MICROSEC(diff);
-
+                       
+                               /* should we aggregate the results or not? */
+                               if (agg_interval > 0)
+                               {
+                                       
+                                       /* are we still in the same interval? 
if yes, accumulate the
+                                       * values (print them otherwise) */
+                                       if (agg->start_time + agg_interval >= 
INSTR_TIME_GET_DOUBLE(now))
+                                       {
+                                               
+                                               /* accumulate */
+                                               agg->cnt += 1;
+                                               
+                                               agg->min_duration = (usec < 
agg->min_duration) ? usec : agg->min_duration;
+                                               agg->max_duration = (usec > 
agg->max_duration) ? usec : agg->max_duration;
+                                               
+                                               agg->sum  += usec;
+                                               agg->sum2 += usec * usec;
+                                               
+                                       }
+                                       else
+                                       {
+                                               
+                                               /* This is a non-Windows 
branch, so we don't  */
+                                               /* This is more than we really 
ought to know about instr_time */
+                                               fprintf(logfile, "%ld %d %.0f 
%.0f %.0f %.0f\n",
+                                                                       
agg->start_time, agg->cnt, agg->sum, agg->sum2,
+                                                                       
agg->min_duration, agg->max_duration);
+                                               
+                                               /* and now reset the values 
(include the current) */
+                                               agg->cnt = 1;
+                                               agg->min_duration = usec;
+                                               agg->max_duration = usec;
+                                                       
+                                               agg->sum = usec;
+                                               agg->sum2 = usec * usec;
+                                               
+                                               /* move to the next interval 
(there may be transactions longer than
+                                               * the desired interval */
+                                               while (agg->start_time + 
agg_interval < INSTR_TIME_GET_DOUBLE(now))
+                                                       agg->start_time = 
agg->start_time + agg_interval;
+                                       }
+                                       
+                               }
+                               else
+                               {
+                                       /* no, print raw transactions */
 #ifndef WIN32
-                               /* This is more than we really ought to know 
about instr_time */
-                               fprintf(logfile, "%d %d %.0f %d %ld %ld\n",
-                                               st->id, st->cnt, usec, 
st->use_file,
-                                               (long) now.tv_sec, (long) 
now.tv_usec);
+                                       /* This is more than we really ought to 
know about instr_time */
+                                       fprintf(logfile, "%d %d %.0f %d %ld 
%ld\n",
+                                                       st->id, st->cnt, usec, 
st->use_file,
+                                                       (long) now.tv_sec, 
(long) now.tv_usec);
 #else
-                               /* On Windows, instr_time doesn't provide a 
timestamp anyway */
-                               fprintf(logfile, "%d %d %.0f %d 0 0\n",
-                                               st->id, st->cnt, usec, 
st->use_file);
+                                       /* On Windows, instr_time doesn't 
provide a timestamp anyway */
+                                       fprintf(logfile, "%d %d %.0f %d 0 0\n",
+                                                       st->id, st->cnt, usec, 
st->use_file);
 #endif
+                               }
                        }
                }
 
@@ -1943,6 +2020,9 @@ main(int argc, char **argv)
                {"tablespace", required_argument, NULL, 2},
                {"unlogged-tables", no_argument, &unlogged_tables, 1},
                {"sampling-rate", required_argument, NULL, 4},
+#ifndef WIN32
+               {"aggregate-interval", required_argument, NULL, 5},
+#endif
                {NULL, 0, NULL, 0}
        };
 
@@ -2156,6 +2236,16 @@ main(int argc, char **argv)
                                        exit(1);
                                }
                                break;
+#ifndef WIN32
+                       case 5:
+                               agg_interval = atoi(optarg);
+                               if (agg_interval <= 0)
+                               {
+                                       fprintf(stderr, "invalid number of 
seconds for aggregation: %d\n", agg_interval);
+                                       exit(1);
+                               }
+                               break;
+#endif
                        default:
                                fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"), progname);
                                exit(1);
@@ -2198,6 +2288,28 @@ main(int argc, char **argv)
                exit(1);
        }
 
+       /* --sampling-rate may must not be used with --aggregate-interval */
+       if (sample_rate > 0.0 && agg_interval > 0)
+       {
+               fprintf(stderr, "log sampling (--sampling-rate) and aggregation 
(--aggregate-interval) can't be used at the same time\n");
+               exit(1);
+       }
+
+       if (agg_interval > 0 && (! use_log)) {
+               fprintf(stderr, "log aggregation is allowed only when actually 
logging transactions\n");
+               exit(1);
+       }
+
+       if ((duration > 0) && (agg_interval > duration)) {
+               fprintf(stderr, "number of seconds for aggregation (%d) must 
not be higher that test duration (%d)\n", agg_interval, duration);
+               exit(1);
+       }
+
+       if ((duration > 0) && (agg_interval > 0) && (duration % agg_interval != 
0)) {
+               fprintf(stderr, "duration (%d) must be a multiple of 
aggregation interval (%d)\n", duration, agg_interval);
+               exit(1);
+       }
+
        /*
         * is_latencies only works with multiple threads in thread-based
         * implementations, not fork-based ones, because it supposes that the
@@ -2457,7 +2569,10 @@ threadRun(void *arg)
        int                     remains = nstate;               /* number of 
remaining clients */
        int                     i;
 
+       AggVals         aggs;
+
        result = pg_malloc(sizeof(TResult));
+       
        INSTR_TIME_SET_ZERO(result->conn_time);
 
        /* open log file if requested */
@@ -2492,6 +2607,8 @@ threadRun(void *arg)
        INSTR_TIME_SET_CURRENT(result->conn_time);
        INSTR_TIME_SUBTRACT(result->conn_time, thread->start_time);
 
+       agg_vals_init(&aggs, thread->start_time);
+       
        /* send start up queries in async manner */
        for (i = 0; i < nstate; i++)
        {
@@ -2500,7 +2617,7 @@ threadRun(void *arg)
                int                     prev_ecnt = st->ecnt;
 
                st->use_file = getrand(thread, 0, num_files - 1);
-               if (!doCustom(thread, st, &result->conn_time, logfile))
+               if (!doCustom(thread, st, &result->conn_time, logfile, &aggs))
                        remains--;                      /* I've aborted */
 
                if (st->ecnt > prev_ecnt && commands[st->state]->type == 
META_COMMAND)
@@ -2602,7 +2719,7 @@ threadRun(void *arg)
                        if (st->con && (FD_ISSET(PQsocket(st->con), &input_mask)
                                                        || 
commands[st->state]->type == META_COMMAND))
                        {
-                               if (!doCustom(thread, st, &result->conn_time, 
logfile))
+                               if (!doCustom(thread, st, &result->conn_time, 
logfile, &aggs))
                                        remains--;      /* I've aborted */
                        }
 
@@ -2629,7 +2746,6 @@ done:
        return result;
 }
 
-
 /*
  * Support for duration option: set timer_exceeded after so many seconds.
  */
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 91530ab..3600208 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -335,6 +335,21 @@ pgbench <optional> <replaceable>options</> </optional> 
<replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>--aggregate-interval</option> 
<replaceable>seconds</></term>
+      <listitem>
+       <para>
+        Length of aggregation interval (in seconds). May be used only together
+        with <application>-l</application> - with this option, the log contains
+        per-interval summary (number of transactions, min/max latency and two
+        additional fields useful for variance estimation).
+       </para>
+       <para>
+        This option is not available on Windows.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-M</option> <replaceable>querymode</></term>
       <listitem>
        <para>
@@ -730,8 +745,9 @@ END;
   <title>Per-Transaction Logging</title>
 
   <para>
-   With the <option>-l</> option, <application>pgbench</> writes the time
-   taken by each transaction to a log file.  The log file will be named
+   With the <option>-l</> option but without the 
<option>--aggregate-interval</option>,
+   <application>pgbench</> writes the time taken by each transaction
+   to a log file.  The log file will be named
    <filename>pgbench_log.<replaceable>nnn</></filename>, where
    <replaceable>nnn</> is the PID of the pgbench process.
    If the <option>-j</> option is 2 or higher, creating multiple worker
@@ -769,11 +785,50 @@ END;
  0 202 2038 0 1175850569 2663
 </screen></para>
 
+<<<<<<< HEAD
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</> option
    can be used to log only a random sample of transactions.
   </para>
+=======
+  <title>Aggregated Logging</title>
+  
+  <para>
+   With the <option>--aggregate-interval</option> option, the logs use a bit 
different format:
+
+<synopsis>
+<replaceable>interval_start</> <replaceable>num_of_transactions</> 
<replaceable>latency_sum</> <replaceable>latency_2_sum</> 
<replaceable>min_latency</> <replaceable>max_latency</>
+</synopsis>
+
+   where <replaceable>interval_start</> is the start of the interval (UNIX 
epoch
+   format timestamp), <replaceable>num_of_transactions</> is the number of 
transactions
+   within the interval, <replaceable>latency_sum</replaceable> is a sum of 
latencies
+   (so you can compute average latency easily). The following two fields are 
useful
+   for variance estimation - <replaceable>latency_sum</> is a sum of latencies 
and
+   <replaceable>latency_2_sum</> is a sum of 2nd powers of latencies. The last 
two
+   fields are <replaceable>min_latency</> - a minimum latency within the 
interval, and
+   <replaceable>max_latency</> - maximum latency within the interval. A 
transaction is
+   counted into the interval when it was committed.
+  </para>
+
+  <para>
+   Here is example outputs:
+<screen>
+1345828501 5601 1542744 483552416 61 2573
+1345828503 7884 1979812 565806736 60 1479
+1345828505 7208 1979422 567277552 59 1391
+1345828507 7685 1980268 569784714 60 1398
+1345828509 7073 1979779 573489941 236 1411
+</screen></para>
+
+  <para>
+   Notice that while the plain (unaggregated) log file contains index
+   of the custom script files, the aggregated log does not. Therefore if
+   you need per script data, you need to aggregate the data on your own.
+  </para>
+
+>>>>>>> pgbench log aggregation with custom interval
  </refsect2>
 
  <refsect2>

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: pgbench - aggregation of info written into log

Reply via email to