Attached is an updated 64-bit pgbench patch that works as expected for all of the most common pgbench operations, including support for scales above the previous boundary of just over 21,000. Here's the patched version running against a 303GB database with a previously unavailable scale factor:

$ pgbench -T 300 -j 2 -c 4 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 25000
query mode: simple
number of clients: 4
number of threads: 2
duration: 300 s
number of transactions actually processed: 21681
tps = 72.249999 (including connections establishing)
tps = 72.250610 (excluding connections establishing)

And some basic Q/A that the values it touched were in the right range:

$ psql -d pgbench -c "select min(aid),max(aid) from pgbench_accounts";

min | max -----+------------
  1 | 2500000000

$ psql -d pgbench -c "select min(aid),max(aid),count(*) from pgbench_accounts where abalance!=0" &

 min  |    max     | count
-------+------------+-------
51091 | 2499989587 | 21678

(This system was doing 300MB/s on reads while executing that count, and it still took 19 minutes)

The clever way Euler updated the patch, you don't pay for the larger on-disk data (bigint columns) unless you use a range that requires it, which greatly reduces the number of ways the test results can suffer from this change. I felt the way that was coded was a bit more complicated than it needed to be though, as it made where that switch happened at get computed at runtime based on the true size of the integers. I took that complexity out and just put a hard line in there instead: if scale>=20000, you get bigints. That's not very different from the real limit, and it made documenting when the switch happens easy to write and to remember.

The main performance concern with this change was whether using int64 more internally for computations would slow things down on a 32-bit system. I thought I'd test that on my few years old laptop. It turns out that even though I've been running an i386 Linux on here, it's actually a 64-bit CPU. (I think that it has a 32-bit install may be an artifact of Adobe Flash install issues, sadly) So this may not be as good of a test case as I'd hoped. Regardless, running a test aimed to stress simple SELECTs, the thing I'd expect to suffer most from additional CPU overhead, didn't show any difference in performance:

$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ psql -c "show shared_buffers"
shared_buffers
----------------
256MB
(1 row)
$ pgbench -S -j 2 -c 4 -T 60 pgbench

i386    x86_64
6932 6924 6923 6926 6923 6922 6688 6772 6914 6791 6902 6916 6917 6909 6943 6837 6689 6744 6688 6744 min
6943    6926    max
6870    6860    average

Given the noise level of pgbench tests, I'm happy saying that is the same speed. I suspect the real overhead in pgbench's processing relates to how it is constantly parsing text to turn them into statements, and that how big the integers it uses are is barley detectable over that.

So...where does that leave this patch? I feel that pgbench will become less relevant very quickly in 9.1 unless something like this is committed. And there don't seem to be significant downsides to this in terms of performance. There are however a few rough points left in here that might raise concern:

1) A look into the expected range of the rand() function suggests the glibc implementation normally proves 30 bits of resolution, so about 1 billion numbers. You'll have >1B rows in a pgbench database once the scale goes over 10,000. So without a major overhaul of how random number generation is treated here, people can expect the distribution of rows touched by a test run to get less even once the database scale gets very large. I added another warning paragraph to the end of the docs in this update to mention this. Long-term, I suspect we may need to adopt a superior 64-bit RNG approach, something like a Mersenne Twister perhaps. That's a bit more than can be chewed on during 9.1 development though.

2) I'd rate odds are good there's one or more corner-case bugs in \setrandom or \setshell I haven't found yet, just from the way that code was converted. Those have some changes I haven't specifically tested exhaustively yet. I don't see any issues when running the most common two pgbench tests, but that's doesn't mean every part of that 32 -> 64 bit conversion was done correctly.

Given how I use pgbench, for data generation and rough load testing, I'd say neither of those concerns outweights the need to expand the size range of this program. I would be happy to see this go in, followed by some alpha and beta testing aimed to see if any of the rough spots I'm concerned about actually appear. Unfortunately I can't fit all of those tests in right now, as throwing around one of these 300GB data sets is painful--when you're only getting 72 TPS, looking for large scale patterns in the transactions takes a long time to do. For example, if I really wanted a good read on how bad the data distribution skew due to small random range is, I'd need to let some things run for a week just for a first pass.

I'd like to see this go in, but the problems I've spotted are such that I would completely understand this being considered not ready by others. Just having this patch available here is a very useful step forward in my mind, because now people can always just grab it and do a custom build if they run into a larger system.

Wavering between Returned with Feedback and Ready for Committer here. Thoughts?

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 55ca1e8..5b9b582 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -60,6 +60,8 @@
 #define INT64_MAX	INT64CONST(0x7FFFFFFFFFFFFFFF)
 #endif
 
+#define MAX_RANDOM_VALUE64	INT64_MAX
+
 /*
  * Multi-platform pthread implementations
  */
@@ -364,15 +366,84 @@ usage(const char *progname)
 		   progname, progname);
 }
 
+/*
+ * strtoint64 -- convert a string to 64-bit integer
+ *
+ * This function is a modified version of scanint8() from
+ * src/backend/utils/adt/int8.c.
+ *
+ */
+static int64
+strtoint64(const char *str)
+{
+	const char *ptr = str;
+	int64		result = 0;
+	int			sign = 1;
+
+	/*
+	 * Do our own scan, rather than relying on sscanf which might be broken
+	 * for long long.
+	 */
+
+	/* skip leading spaces */
+	while (*ptr && isspace((unsigned char) *ptr))
+		ptr++;
+
+	/* handle sign */
+	if (*ptr == '-')
+	{
+		ptr++;
+
+		/*
+		 * Do an explicit check for INT64_MIN.	Ugly though this is, it's
+		 * cleaner than trying to get the loop below to handle it portably.
+		 */
+		if (strncmp(ptr, "9223372036854775808", 19) == 0)
+		{
+			result = -INT64CONST(0x7fffffffffffffff) - 1;
+			ptr += 19;
+			goto gotdigits;
+		}
+		sign = -1;
+	}
+	else if (*ptr == '+')
+		ptr++;
+
+	/* require at least one digit */
+	if (!isdigit((unsigned char) *ptr))
+		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+
+	/* process digits */
+	while (*ptr && isdigit((unsigned char) *ptr))
+	{
+		int64		tmp = result * 10 + (*ptr++ - '0');
+
+		if ((tmp / 10) != result)		/* overflow? */
+			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+		result = tmp;
+	}
+
+gotdigits:
+
+	/* allow trailing whitespace, but not other trailing chars */
+	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+		ptr++;
+
+	if (*ptr != '\0')
+		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+
+	return ((sign < 0) ? -result : result);
+}
+
 /* random number generator: uniform distribution from min to max inclusive */
-static int
-getrand(int min, int max)
+static int64
+getrand(int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
 	 * being selected as do numbers between them.
 	 */
-	return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
+	return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
 }
 
 /* call PQexec() and exit() on failure */
@@ -887,7 +958,7 @@ top:
 		if (commands[st->state] == NULL)
 		{
 			st->state = 0;
-			st->use_file = getrand(0, num_files - 1);
+			st->use_file = (int) getrand(0, num_files - 1);
 			commands = sql_files[st->use_file];
 		}
 	}
@@ -1007,7 +1078,7 @@ top:
 		if (pg_strcasecmp(argv[0], "setrandom") == 0)
 		{
 			char	   *var;
-			int			min,
+			int64		min,
 						max;
 			char		res[64];
 
@@ -1019,15 +1090,15 @@ top:
 					st->ecnt++;
 					return true;
 				}
-				min = atoi(var);
+				min = strtoint64(var);
 			}
 			else
-				min = atoi(argv[2]);
+				min = strtoint64(argv[2]);
 
 #ifdef NOT_USED
 			if (min < 0)
 			{
-				fprintf(stderr, "%s: invalid minimum number %d\n", argv[0], min);
+				fprintf(stderr, "%s: invalid minimum number " INT64_FORMAT "\n", argv[0], min);
 				st->ecnt++;
 				return;
 			}
@@ -1041,22 +1112,22 @@ top:
 					st->ecnt++;
 					return true;
 				}
-				max = atoi(var);
+				max = strtoint64(var);
 			}
 			else
-				max = atoi(argv[3]);
+				max = strtoint64(argv[3]);
 
-			if (max < min || max > MAX_RANDOM_VALUE)
+			if (max < min || max > MAX_RANDOM_VALUE64)
 			{
-				fprintf(stderr, "%s: invalid maximum number %d\n", argv[0], max);
+				fprintf(stderr, "%s: invalid maximum number " INT64_FORMAT "\n", argv[0], max);
 				st->ecnt++;
 				return true;
 			}
 
 #ifdef DEBUG
-			printf("min: %d max: %d random: %d\n", min, max, getrand(min, max));
+			printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(min, max));
 #endif
-			snprintf(res, sizeof(res), "%d", getrand(min, max));
+			snprintf(res, sizeof(res), INT64_FORMAT, getrand(min, max));
 
 			if (!putVariable(st, argv[0], argv[1], res))
 			{
@@ -1069,7 +1140,7 @@ top:
 		else if (pg_strcasecmp(argv[0], "set") == 0)
 		{
 			char	   *var;
-			int			ope1,
+			int64		ope1,
 						ope2;
 			char		res[64];
 
@@ -1081,13 +1152,13 @@ top:
 					st->ecnt++;
 					return true;
 				}
-				ope1 = atoi(var);
+				ope1 = strtoint64(var);
 			}
 			else
-				ope1 = atoi(argv[2]);
+				ope1 = strtoint64(argv[2]);
 
 			if (argc < 5)
-				snprintf(res, sizeof(res), "%d", ope1);
+				snprintf(res, sizeof(res), INT64_FORMAT, ope1);
 			else
 			{
 				if (*argv[4] == ':')
@@ -1098,17 +1169,17 @@ top:
 						st->ecnt++;
 						return true;
 					}
-					ope2 = atoi(var);
+					ope2 = strtoint64(var);
 				}
 				else
-					ope2 = atoi(argv[4]);
+					ope2 = strtoint64(argv[4]);
 
 				if (strcmp(argv[3], "+") == 0)
-					snprintf(res, sizeof(res), "%d", ope1 + ope2);
+					snprintf(res, sizeof(res), INT64_FORMAT, ope1 + ope2);
 				else if (strcmp(argv[3], "-") == 0)
-					snprintf(res, sizeof(res), "%d", ope1 - ope2);
+					snprintf(res, sizeof(res), INT64_FORMAT, ope1 - ope2);
 				else if (strcmp(argv[3], "*") == 0)
-					snprintf(res, sizeof(res), "%d", ope1 * ope2);
+					snprintf(res, sizeof(res), INT64_FORMAT, ope1 * ope2);
 				else if (strcmp(argv[3], "/") == 0)
 				{
 					if (ope2 == 0)
@@ -1117,7 +1188,7 @@ top:
 						st->ecnt++;
 						return true;
 					}
-					snprintf(res, sizeof(res), "%d", ope1 / ope2);
+					snprintf(res, sizeof(res), INT64_FORMAT, ope1 / ope2);
 				}
 				else
 				{
@@ -1239,9 +1310,9 @@ init(void)
 		"drop table if exists pgbench_tellers",
 		"create table pgbench_tellers(tid int not null,bid int,tbalance int,filler char(84)) with (fillfactor=%d)",
 		"drop table if exists pgbench_accounts",
-		"create table pgbench_accounts(aid int not null,bid int,abalance int,filler char(84)) with (fillfactor=%d)",
+		"create table pgbench_accounts(aid %s not null,bid int,abalance int,filler char(84)) with (fillfactor=%d)",
 		"drop table if exists pgbench_history",
-		"create table pgbench_history(tid int,bid int,aid int,delta int,mtime timestamp,filler char(22))"
+		"create table pgbench_history(tid int,bid int,aid %s,delta int,mtime timestamp,filler char(22))"
 	};
 	static char *DDLAFTERs[] = {
 		"alter table pgbench_branches add primary key (bid)",
@@ -1253,6 +1324,7 @@ init(void)
 	PGresult   *res;
 	char		sql[256];
 	int			i;
+	int64		k;
 
 	if ((con = doConnect()) == NULL)
 		exit(1);
@@ -1263,8 +1335,7 @@ init(void)
 		 * set fillfactor for branches, tellers and accounts tables
 		 */
 		if ((strstr(DDLs[i], "create table pgbench_branches") == DDLs[i]) ||
-			(strstr(DDLs[i], "create table pgbench_tellers") == DDLs[i]) ||
-			(strstr(DDLs[i], "create table pgbench_accounts") == DDLs[i]))
+			(strstr(DDLs[i], "create table pgbench_tellers") == DDLs[i]))
 		{
 			char		ddl_stmt[128];
 
@@ -1272,6 +1343,36 @@ init(void)
 			executeStatement(con, ddl_stmt);
 			continue;
 		}
+		else if (strstr(DDLs[i], "create table pgbench_accounts") == DDLs[i])
+		{
+			char		ddl_stmt[128];
+
+			/*
+			 * Use bigint columns in cases where scale factor is bigger
+			 * than 20000, which gives an account number around the upper
+			 * limit the range for a regular integer (just over 2 billion).
+			 * For smaller scale factors, we still use int columns, to keep
+			 * results more like those given by earlier versions of pgbench.
+			 */
+			if (scale >= 20000)
+				snprintf(ddl_stmt, 128, DDLs[i], "bigint", fillfactor);
+			else
+				snprintf(ddl_stmt, 128, DDLs[i], "int", fillfactor);
+			executeStatement(con, ddl_stmt);
+			continue;
+		}
+		else if (strstr(DDLs[i], "create table pgbench_history") == DDLs[i])
+		{
+			char		ddl_stmt[128];
+
+			/* The accounts value in the history table needs a matching size */
+			if (scale >= 20000)
+				snprintf(ddl_stmt, 128, DDLs[i], "bigint");
+			else
+				snprintf(ddl_stmt, 128, DDLs[i], "int");
+			executeStatement(con, ddl_stmt);
+			continue;
+		}
 		else
 			executeStatement(con, DDLs[i]);
 	}
@@ -1309,11 +1410,11 @@ init(void)
 	}
 	PQclear(res);
 
-	for (i = 0; i < naccounts * scale; i++)
+	for (k = 0; k < (int64) naccounts * scale; k++)
 	{
-		int			j = i + 1;
+		int64			j = k + 1;
 
-		snprintf(sql, 256, "%d\t%d\t%d\t\n", j, i / naccounts + 1, 0);
+		snprintf(sql, 256, INT64_FORMAT "\t" INT64_FORMAT "\t%d\t\n", j, k / naccounts + 1, 0);
 		if (PQputline(con, sql))
 		{
 			fprintf(stderr, "PQputline failed\n");
@@ -1321,7 +1422,7 @@ init(void)
 		}
 
 		if (j % 10000 == 0)
-			fprintf(stderr, "%d tuples done.\n", j);
+			fprintf(stderr, INT64_FORMAT " tuples done.\n", j);
 	}
 	if (PQputline(con, "\\.\n"))
 	{
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index a33ac17..05c9794 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -155,6 +155,10 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
         Multiply the number of rows generated by the scale factor.
         For example, <literal>-s 100</> will create 10,000,000 rows
         in the <structname>pgbench_accounts</> table. Default is 1.
+        When the scale is 20,000 or larger, the columns used to
+        hold account identifiers will switch to using larger integers,
+        in order to be big enough to hold the range of account
+        identifiers.
        </para>
       </listitem>
      </varlistentry>
@@ -769,6 +773,19 @@ statement latencies in milliseconds:
    instances concurrently, on several client machines, against the same
    database server.
   </para>
+
+  <para>
+   The random number generation done by <application>pgbench</> is limited
+   the accuracy of the underlying operating system numeric library.  Typically
+   this provides range of approximately 1 billion values.  There will
+   be a billion rows in the <structname>pgbench_accounts</> table once the
+   database scale is increased to 10,000 or above.  When creating large
+   databases above or even near this scale, you should expect that the
+   distribution of rows selected by random operations will no longer be
+   as evenly distributed, relative to when the program is running against a
+   smaller database.
+  </para>
+ 
  </sect2>
 
 </sect1>
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to