Greetings, * Andrey Lepikhov (a.lepik...@postgrespro.ru) wrote: > During the implementation of sharding related improvements i noticed that if > we use a lot of foreign partitions, we have bad plans because of vacuum > don't update statistics of foreign tables.This is done by the ANALYZE > command, but it is very expensive operation for foreign table. > Problem with statistics demonstrates with TAP-test from the first patch in > attachment.
Yes, the way we handle ANALYZE today for FDWs is pretty terrible, since we stream the entire table across to do it. > I implemented some FDW + pg core machinery to reduce weight of the problem. > The ANALYZE command on foreign table executes query on foreign server that > extracts statistics tuple, serializes it into json-formatted string and > returns to the caller. The caller deserializes this string, generates > statistics for this foreign table and update it. The second patch is a > proof-of-concept. Isn't this going to create a version dependency that we'll need to deal with..? What if a newer major version has some kind of improved ANALYZE command, in terms of what it looks at or stores, and it's talking to an older server? When I was considering the issue with ANALYZE and FDWs, I had been thinking it'd make sense to just change the query that's built in deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in more-or-less the same manner as today. If we don't like the available TABLESAMPLE methods then we could add a new one that's explicitly the 'right' sample for an ANALYZE call and use that when it's available on the remote side. Not sure if it'd make any sense for ANALYZE itself to start using that same TABLESAMPLE code, but maybe? Not that I think it'd be much of an issue if it's independent either, with appropriate comments to note that we should probably try to make them match up for the sake of FDWs. > This patch speedup analyze command and provides statistics relevance on a > foreign table after autovacuum operation. Its effectiveness depends on > relevance of statistics on the remote server, but still. If we do decide to go down this route, wouldn't it mean we'd have to solve the problem of what to do when it's a 9.6 foreign server being queried from a v12 server and dealing with any difference in the statistics structures of the two? Seems like we would... in which case I would say that we should pull that bit out and make it general, and use it for pg_upgrade too, which would benefit a great deal from having the ability to upgrade stats between major versions also. That's a much bigger piece to take on, of course, but seems to be what's implied with this approach for the FDW. Thanks, Stephen
signature.asc
Description: PGP signature