Re: [HACKERS] Sequential Scan Read-Ahead
Tom Lane wrote: > ... > Curt Sampson <[EMAIL PROTECTED]> writes: > > 3. Proof by testing. I wrote a little ruby program to seek to a > > random point in the first 2 GB of my raw disk partition and read > > 1-8 8K blocks of data. (This was done as one I/O request.) (Using > > the raw disk partition I avoid any filesystem buffering.) > > And also ensure that you aren't testing the point at issue. > The point at issue is that *in the presence of kernel read-ahead* > it's quite unclear that there's any benefit to a larger request size. > Ideally the kernel will have the next block ready for you when you > ask, no matter what the request is. > ... I have to agree with Tom. I think the numbers below show that with kernel read-ahead, block size isn't an issue. The big_file1 file used below is 2.0 gig of random data, and the machine has 512 mb of main memory. This ensures that we're not just getting cached data. foreach i (4k 8k 16k 32k 64k 128k) echo $i time dd bs=$i if=big_file1 of=/dev/null end and the results: bsuserkernel elapsed 4k: 0.260 7.7401:27.25 8k: 0.210 8.0601:30.48 16k: 0.090 7.7901:30.88 32k: 0.060 8.0901:32.75 64k: 0.030 8.1901:29.11 128k: 0.070 9.8301:28.74 so with kernel read-ahead, we have basically the same elapsed (wall time) regardless of block size. Sure, user time drops to a low at 64k blocksize, but kernel time is increasing. You could argue that this is a contrived example, no other I/O is being done. Well I created a second 2.0g file (big_file2) and did two simultaneous reads from the same disk. Sure performance went to hell but it shows blocksize is still irrelevant in a multi I/O environment with sequential read-ahead. foreach i ( 4k 8k 16k 32k 64k 128k ) echo $i time dd bs=$i if=big_file1 of=/dev/null & time dd bs=$i if=big_file2 of=/dev/null & wait end bsuserkernel elapsed 4k: 0.480 8.2906:34.13 bigfile1 0.320 8.7306:34.33 bigfile2 8k: 0.250 7.5806:31.75 0.180 8.4506:31.88 16k: 0.150 8.3906:32.47 0.100 7.9006:32.55 32k: 0.190 8.4606:24.72 0.060 8.4106:24.73 64k: 0.060 9.3506:25.05 0.150 9.2406:25.13 128k: 0.090 10.6106:33.14 0.110 11.3206:33.31 the differences in read times are basically in the mud. Blocksize just doesn't matter much with the kernel doing readahead. -Kyle ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [mail] Re: 7.4 Wishlist
Without getting into too many details, why not send toast data to non-local clients? Seems that would be the big win. The data is already compressed, so the server wouldn't pay cpu time to recompress anything. And since toast data is relatively large anyway, it's the stuff you'd want to compress before putting it on the wire anyway. If this is remotely possible let me know, I might be interested in taking a look at it. -Kyle Bruce Momjian wrote: > > I am not excited about per-db/user compression because of the added > complexity of setting it up, and even set up, I can see cases where some > queries would want it, and others not. I can see using GUC to control > this. If you enable it and the client doesn't support it, it is a > no-op. We have per-db and per-user settings, so GUC would allow such > control if you wish. > > Ideally, it would be a tri-valued parameter, that is ON, OFF, or AUTO, > meaning it would determine if there was value in the compression and do > it only when it would help. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Client/Server compression?
On the subject on client/server compression, does the server decompress toast data before sending it to the client? Is so, why (other than requiring modifications to the protocol)? On the flip side, does/could the client toast insert/update data before sending it to the server? -Kyle ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Client/Server compression?
Greg Copeland wrote: > [cut] > My current thoughts are to allow for enabled/disabled compression and > variable compression settings (1-9) within a database configuration. > Worse case, it may be fun to implement and I'm thinking there may > actually be some surprises as an end result if it's done properly. > > [cut] > > Greg Wouldn't Tom's suggestion of riding on top of ssh would give similar results? Anyway, it'd probably be a good proof of concept of whether or not it's worth the effort. And that brings up the question: how would you measure the benefit? I'd assume you'd get a good cut in network traffic, but you'll take a hit in cpu time. What's an acceptable tradeoff? That's one reason I was thinking about the toast stuff. If the backend could serve toast, you'd get an improvement in server to client network traffic without the server spending cpu time on compression since the data has previously compressed. Let me know if this is feasible (or slap me if this is how things already are): when the backend detoasts data, keep both copies in memory. When it comes time to put data on the wire, instead of putting the whole enchilada down give the client the compressed toast instead. And yeah, I guess this would require a protocol change to flag the compressed data. But it seems like a way to leverage work already done. -kf ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] libpq Alternate Row Processor
The guts of pqRowProcessor in libpq does a good bit of work to maintain the internal data structure of a PGresult. There are a few use cases where the caller doesn't need the ability to access the result set row by row, column by column using PQgetvalue. Think of an ORM that is just going to copy the data from PGresult for each row into its own structures. I've got a working proof of concept that allows the caller to attach a callback that pqRowProcessor will call instead of going thru its own routine. This eliminates all the copying of data from the PGconn buffer to a PGresult buffer and then ultimately a series of PQgetvalue calls by the client. The callback allows the caller to receive each row's data directly from the PGconn buffer. It would require exposing struct pgDataValue in libpq-fe.h. The prototype for the callback pointer would be: int (*PQrowProcessorCB)(PGresult*, const PGdataValue*, int col_count, void *user_data); My initial testing shows a significant performance improvement. I'd like some opinions on this before wiring up a performance proof and updating the documentation for a formal patch submission. Kyle Gearhart -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq Alternate Row Processor
From: Tom Lane [mailto:t...@sss.pgh.pa.us]: > Kyle Gearhart writes: >> The guts of pqRowProcessor in libpq does a good bit of work to maintain the >> internal data structure of a PGresult. There are a few use cases where the >> caller doesn't need the ability to access the result set row by row, column >> by column using PQgetvalue. Think of an ORM that is just going to copy the >> data from PGresult for each row into its own structures. > It seems like you're sort of reinventing "single row mode": https://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html > Do we really need yet another way of breaking the unitary-query-result > abstraction? If it's four times faster...then the option should be available in libpq. I'm traveling tomorrow but will try to get a patch and proof with pgbench dataset up by the middle of the week. The performance gains are consistent with Jim Nasby's findings with SPI. Kyle Gearhart -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq Alternate Row Processor
On 2/9/17 7:15 PM, Jim Nasby wrote: > Can you run a trace to see where all the time is going in the single row > case? I don't see an obvious time-suck with a quick look through the code. > It'd be interesting to see how things change if you eliminate the filler > column from the SELECT. Traces are attached, these are with callgrind. profile_nofiller.txt: single row without filler column profile_filler.txt: single row with filler column profile_filler_callback.txt: callback with filler column pqResultAlloc looks to hit malloc pretty hard. The callback reduces all of that to a single malloc for each row. Without the filler, here is the average over 11 runs: Realusersys Callback.133.033.035 Single Row .170.112.029 For the callback case, it's slightly higher than the prior results with the filler column. Profile data file 'callgrind.out.14930' (creator: callgrind-3.11.0) I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 74120972 Trigger: Program termination Profiled target: ./test -m row (PID 14930, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off Ir 313,455,690 PROGRAM TOTALS Ir file:function 61,410,828 ???:_int_malloc [/usr/lib64/libc-2.17.so] 38,321,887 ???:_int_free [/usr/lib64/libc-2.17.so] 25,800,115 ???:pqResultAlloc [/usr/local/pgsql/lib/libpq.so.5.10] 20,611,330 ???:pqParseInput3 [/usr/local/pgsql/lib/libpq.so.5.10] 16,002,817 ???:malloc [/usr/lib64/libc-2.17.so] 14,800,004 ???:pqRowProcessor [/usr/local/pgsql/lib/libpq.so.5.10] 12,604,893 ???:pqGetInt [/usr/local/pgsql/lib/libpq.so.5.10] 10,400,004 ???:PQsetResultAttrs [/usr/local/pgsql/lib/libpq.so.5.10] 10,200,316 main.c:main [/usr/local/src/postgresql-perf/test] 9,600,000 ???:check_tuple_field_number [/usr/local/pgsql/lib/libpq.so.5.10] 8,300,631 ???:__strcpy_sse2_unaligned [/usr/lib64/libc-2.17.so] 7,500,075 ???:pqResultStrdup [/usr/local/pgsql/lib/libpq.so.5.10] 7,500,000 ???:pqSkipnchar [/usr/local/pgsql/lib/libpq.so.5.10] 7,017,368 ???:__memcpy_ssse3_back [/usr/lib64/libc-2.17.so] 6,900,000 ???:PQgetisnull [/usr/local/pgsql/lib/libpq.so.5.10] 6,401,100 ???:free [/usr/lib64/libc-2.17.so] 6,200,004 ???:PQcopyResult [/usr/local/pgsql/lib/libpq.so.5.10] 6,100,959 ???:__strlen_sse2_pminub [/usr/lib64/libc-2.17.so] 5,700,000 ???:PQgetvalue [/usr/local/pgsql/lib/libpq.so.5.10] 4,700,045 ???:PQclear [/usr/local/pgsql/lib/libpq.so.5.10] 4,200,057 ???:PQmakeEmptyPGresult [/usr/local/pgsql/lib/libpq.so.5.10] 4,103,903 ???:PQgetResult [/usr/local/pgsql/lib/libpq.so.5.10] 3,400,000 ???:pqAddTuple [/usr/local/pgsql/lib/libpq.so.5.10] 3,203,437 ???:pqGetc [/usr/local/pgsql/lib/libpq.so.5.10] 2,600,034 ???:pqPrepareAsyncResult [/usr/local/pgsql/lib/libpq.so.5.10] 2,500,679 ???:appendBinaryPQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10] 2,300,621 ???:enlargePQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10] 1,600,016 ???:appendPQExpBufferStr [/usr/local/pgsql/lib/libpq.so.5.10] 900,270 ???:resetPQExpBuffer [/usr/local/pgsql/lib/libpq.so.5.10] Profile data file 'callgrind.out.15062' (creator: callgrind-3.11.0) I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 84068364 Trigger: Program termination Profiled target: ./test -m row (PID 15062, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off Ir 358,525,458 PROGRAM TOTALS Ir file:function 61,410,901 ???:_int_malloc [/usr/lib64/libc-2.17.so] 38,321,887 ???:_int_free [/usr/lib64/libc-2.17.so] 31,400,139 ???:pqResultAlloc [/usr/local/pgsql/lib/libpq.so.5.10] 22,839,505 ???:pqParseInput3 [/usr/local/pgsql/lib/libpq.so.5.10] 17,600,004 ???:pqRowProcessor [/usr/local/pgsql/lib/libpq.so.5.10] 16,002,817 ???:malloc [/usr/lib64/libc-2.17.so] 14,716,359 ???:pqGetInt [/usr/local/pgsql/
Re: [HACKERS] libpq Alternate Row Processor
On Mon, Feb 13, 2017 Merlin Moncure wrote: >A barebones callback mode ISTM is a complete departure from the classic >PGresult interface. This code is pretty unpleasant IMO: acct->abalance = *((int*)PQgetvalue(res, 0, i)); abalance = acct->__bswap_32(acct->abalance); > Your code is faster but foists a lot of the work on the user, so it's kind of > cheating in a way (although very carefully written applications might be able > to benefit). The bit you call out above is for single row mode. Binary mode is a slippery slope, with or without the proposed callback. Let's remember that one of the biggest, often overlooked, gains when using an ORM is that it abstracts all this mess away. The goal here is to prevent all the ORM/framework folks from having to implement protocol. Otherwise they get to wait on libpq to copy from the socket to the PGconn buffer to the PGresult structure to their buffers. The callback keeps the slowest guy on the team...on the bench. Kyle Gearhart -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Large Object problems (was Re: JDBC int8 hack)
Sorry, meant to hit all of these. On Tue, Apr 10, 2001 at 02:24:24PM +0100, Peter Mount wrote: > >I'm going to start digging around in the optimizer code so such hacks as > >mine aren't needed. It's really haenous to find out your production > >server is freaking out and doing sequential scans for EVERYTHING. > > Are you talking about the optimiser in the backend as there isn't one in > the jdbc driver. Yeah, in the backend. My patch to the JDBC driver only helps people using JDBC to get to a database (obviously). From any other access method, a statement like: SELECT * FROM Foo where bar=1234 will do a sequential scan even if there is an index Foo_bar_idx on "bar" if bar is INT8. It seems to me that the optimizer should be able to notice the index Foo_bar_idx and convert the argument "1234" to an INT8 in order to use Foo_bar_idx over doing a sequential scan (in which case, "1234" probably gets converted to INT8 anyhow to do comparisons). Granted, I'm theorizing. I should probably shut up and RTFS. Anyhow, all my patch did was tack the "::int8" cast onto parameters that were set by PreparedStatement.setLong(). We did this after finding that EXPLAIN'ing this: SELECT * FROM Foo where bar=1234::int8 didn't degrade to a sequential scan like the other SELECT statement (w/o the cast). > >Another hack I need to work on (or someone else can) is to squish in a > >layer of filesystem hashing for large objects. We tried to use large > >objects and got destroyed. 40,000 rows and the server barely functioned. > >I think this is because of 2 things: > > > >1) Filehandles not being closed. This was an oversite I've seen covered > >in the list archives somewhere. > > Ok, ensure you are closing the large objects within JDBC. If you are then > this is a backend problem. > > One thing to try is to commit the transaction a bit more often (if you are > running within a single transaction for all 40k objects). Committing the > transaction will force the backend to close all open large objects on that > connection. We were using setBytes(), as we were trying to minimize porting work from the previous database we were using. And we were comitting after every transaction. We switched to Base64 encoding and storing strings, so we're in better shape now. I'm going to write some more test code in my evenings and see if I can get current PostgreSQL to suck up filehandles. I'll post again if I can put together a coherent bug report or patch. > >2) The fact that all objects are stored in a the single data directory. > >Once you get up to a good number of objects, directory scans really take a > >long, long time. This slows down any subsequent openings of large > >objects. Is someone working on this problem? Or have a patch already? > > Again not JDBC. Forwarding to the hackers list on this one. The naming > conventions were changed a lot in 7.1, and it was for more flexability. Right, cool. I'll check out the new codebase. Thanks. This is *so* the curse of open source: now I'm going to be using up my personal time to look for bugs I find at work. Oh well, I didn't need to sleep anyhow. ;-) -- Kyle. "I hate every ape I see, from chimpan-A to chimpan-Z" -- Troy McClure ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Re: Large Object problems (was Re: JDBC int8 hack)
On Tue, Apr 10, 2001 at 02:24:24PM +0100, Peter Mount wrote: > At 18:30 09/04/01 -0700, Kyle VanderBeek wrote: > >This is a new feature? Using indecies is "new"? I guess I really beg to > >differ. Seems like a bugfix to me (in the "workaround" category). > > Yes they are. INT8 is not a feature/type yet supported by the driver, hence > it's "new". > > Infact the jdbc driver supports no array's at this time (as PostgreSQL & > SQL3 arrays are different beasts). > > If it's worked in the past, then that was sheer luck. Alright man, you've got me confused. Are you saying that despite the existance of INT8 as a column type, and PreparedStatement.setLong(), that these ought not be used? If so, there is a really big warning missing from the documentation! I guess I'm asking this: I've got an enterprise database runnign 7.0.3 ready to go using INT8 primary keys and being accessed through my re-touched JDBC driver. Am I screwed? Is it going to break? If so, I need to fix this all very, very fast. -- Kyle. "I hate every ape I see, from chimpan-A to chimpan-Z" -- Troy McClure ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[HACKERS] Re: [PATCHES] Re: Large Object problems (was Re: JDBC int8 hack)
On Wed, Apr 11, 2001 at 02:57:16AM +, Thomas Lockhart wrote: > > Alright man, you've got me confused. Are you saying that despite the > > existance of INT8 as a column type, and PreparedStatement.setLong(), that > > these ought not be used? If so, there is a really big warning missing > > from the documentation! > > Ah, it just dawned on me what might be happening: Peter, I'm guessing > that you are thinking of "INT48" or some such, the pseudo-integer array > type. Kyle is referring to the "int8" 8 byte integer type. Yes! > > I guess I'm asking this: I've got an enterprise database runnign 7.0.3 > > ready to go using INT8 primary keys and being accessed through my > > re-touched JDBC driver. Am I screwed? Is it going to break? If so, I > > need to fix this all very, very fast. > > btw, it might be better to use a syntax like > > ... where col = '1234'; > > or > > ... where col = int8 '1234'; > > If the former works, then that is a bit more generic that slapping a > "::int8" onto the constant field. It seems like a wash to me; either way gets the desired result. Tacking on ::int8 was the quickest. It also seems neater than this: set(parameterIndex, ("int8 '" + new Long(x)).toString() + "'"); in PreparedStatement.setLong(). > I'd imagine that this could also be coded into the app; if so that may > be where it belongs since then the driver does not have to massage the > queries as much and it will be easier for the *driver* to stay > compatible with applications. This seems to be the wrong idea to me. The idea is that JDBC allows you to be a little bit "backend agnostic". It'd be pretty disappointing if this wasn't true for even the base types. Application programmers should just call setLong() they're dealing with an 8-byte (Long or long) integer. It'd be a shame to have a PostgreSQL-specific call to setString("int8 '" + x.toString() + "'") littering your code. That seems to fly in the face of everything that JDBC/DBI/ODBC (etc) are about. -- Kyle. "I hate every ape I see, from chimpan-A to chimpan-Z" -- Troy McClure ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Re: Large Object problems (was Re: JDBC int8 hack)
On Tue, Apr 17, 2001 at 09:11:54AM -0400, Peter T Mount wrote: > Erm, int8 isn't long, but an array of 8 int's (unless it's changed). http://postgresql.readysetnet.com/users-lounge/docs/7.0/user/datatype.htm#AEN942 It is very much an 8-byte integer, the correlary to Java's Long/long. -- Kyle. "I hate every ape I see, from chimpan-A to chimpan-Z" -- Troy McClure ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl