date:20201008

Re: shared-memory based stats collector

2020-10-08 Thread Kyotaro Horiguchi

At Tue, 06 Oct 2020 10:06:44 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> The previous version failed to flush local database stats for certain
> condition. That behavior caused useless retries and finally a forced
> flush that leads to contention. I fixed that and will measure
> performance with this version.

I (we) got some performance numbers.

- Fetching 1 tuple from 1 of 100 tables from 100 to 800 clients.
- Fetching 1 tuple from 1 of 10 tables from 100 to 800 clients.

Those showed speed of over 400,000 TPS at maximum, and no siginificant
difference is seen between patched and unpatched at the all range of
the test. I tried 5 seconds as PGSTAT_MIN_INTERVAL (10s in the patch)
but that made no difference.

- Fetching 1 tuple from 1 table from 800 clients.

No graph for this is not attached but this test shows speed of over 42
TPS with or without the v39 patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-10-08 Thread Dilip Kumar

On Fri, Oct 2, 2020 at 12:26 PM Keisuke Kuroda
 wrote:
>
> Hi Dilip, Amit,
>
> > > 5. Can you please once repeat the performance test done by Keisuke-San
> > > to see if you have similar observations? Additionally, see if you are
> > > also seeing the inconsistency related to the Truncate message reported
> > > by him and if so why?
> > >
> >
> > Okay, I will set up and do this test early next week.  Keisuke-San,
> > can you send me your complete test script?
>
> Yes, I've attached a test script(test_pg_recvlogical.sh)
>
> Sorry, the issue with TRUNCATE not outputting was due to a build miss
> on my part.
> Even before the patch, TRUNCATE decodes and outputting correctly.
> So, please check the performance only.
>
> I have tested it again and will share the results with you.
>
> Also, the argument of palloc was still MemoryContextAlloc,
> which prevented me from applying the patch, so I've only fixed that part.
>
> # test script
>
> Please set PGHOME and CLUSTER_PUB before run.
>
> sh test_pg_recvlogical.sh
>
> # perf command
>
> perf record --call-graph dwarf -p [walsender pid]
> perf report -i perf.data --no-children
>
> # before patch
>
> decode + invalidation = 222s
>
> 2020-10-02 14:55:50 BEGIN 509
> 2020-10-02 14:59:42 table nsp_001.tbl_001, nsp_001.part_0001 ...
> nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags)
> 2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09)

I could not see this issue even without the patch, it is taking less
than 1s even without the patch.  See below results

2020-10-08 13:00:49 BEGIN 509
2020-10-08 13:00:49 table nsp_001.part_0001: INSERT:...
2020-10-08 13:00:49 COMMIT 509 (at 2020-10-08 13:00:48.741986+05:30)

Am I missing something?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

dynamic result sets support in extended query protocol

2020-10-08 Thread Peter Eisentraut

I want to progress work on stored procedures returning multiple result
sets. Examples of how this could work on the SQL side have previously
been shown [0]. We also have ongoing work to make psql show multiple
result sets [1]. This appears to work fine in the simple query
protocol. But the extended query protocol doesn't support multiple
result sets at the moment [2]. This would be desirable to be able to
use parameter binding, and also since one of the higher-level goals
would be to support the use case of stored procedures returning multiple
result sets via JDBC.

[0]:
https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com

[1]: https://commitfest.postgresql.org/29/2096/
[2]: https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us

(Terminology: I'm calling this project "dynamic result sets", which
includes several concepts: 1) multiple result sets, 2) those result sets
can have different structures, 3) the structure of the result sets is
decided at run time, not declared in the schema/procedure definition/etc.)

One possibility I rejected was to invent a third query protocol beside
the simple and extended one. This wouldn't really match with the
requirements of JDBC and similar APIs because the APIs for sending
queries don't indicate whether dynamic result sets are expected or
required, you only indicate that later by how you process the result
sets. So we really need to use the existing ways of sending off the
queries. Also, avoiding a third query protocol is probably desirable in
general to avoid extra code and APIs.

So here is my sketch on how this functionality could be woven into the
extended query protocol. I'll go through how the existing protocol
exchange works and then point out the additions that I have in mind.

These additions could be enabled by a _pq_ startup parameter sent by the
client. Alternatively, it might also work without that because the
client would just reject protocol messages it doesn't understand, but
that's probably less desirable behavior.

So here is how it goes:

C: Parse
S: ParseComplete

At this point, the server would know whether the statement it has parsed
can produce dynamic result sets. For a stored procedure, this would be
declared with the procedure definition, so when the CALL statement is
parsed, this can be noticed. I don't actually plan any other cases, but
for the sake of discussion, perhaps some variant of EXPLAIN could also
return multiple result sets, and that could also be detected from
parsing the EXPLAIN invocation.

At this point a client would usually do

C: Describe (statement)
S: ParameterDescription
S: RowDescription

New would be that the server would now also respond with a new message, say,

S: DynamicResultInfo

that indicates that dynamic result sets will follow later. The message
would otherwise be empty. (We could perhaps include the number of
result sets, but this might not actually be useful, and perhaps it's
better not to spent effort on counting things that don't need to be
counted.)

(If we don't guard this by a _pq_ startup parameter from the client, an
old client would now error out because of an unexpected protocol message.)

Now the normal bind and execute sequence follows:

C: Bind
S: BindComplete
(C: Describe (portal))
(S: RowDescription)
C: Execute
S: ... (DataRows)
S: CommandComplete

In the case of a CALL with output parameters, this "primary" result set
contains one row with the output parameters (existing behavior).

Now, if the client has seen DynamicResultInfo earlier, it should now go
into a new subsequence to get the remaining result sets, like this
(naming obviously to be refined):

C: NextResult
S: NextResultReady
C: Describe (portal)
S: RowDescription
C: Execute

S: CommandComplete
C: NextResult
...
C: NextResult
S: NoNextResult
C: Sync
S: ReadyForQuery

I think this would all have to use the unnamed portal, but perhaps there
could be other uses with named portals. Some details to be worked out.

One could perhaps also do without the DynamicResultInfo message and just
put extra information into the CommandComplete message indicating "there
are more result sets after this one".

(Following the model from the simple query protocol, CommandComplete
really means one result set complete, not the whole top-level command.
ReadyForQuery means the whole command is complete. This is perhaps
debatable, and interesting questions could also arise when considering
what should happen in the simple query protocol when a query string
consists of multiple commands each returning multiple result sets. But
it doesn't really seem sensible to cater to that.)

One thing that's missing in this sequence is a way to specify the
desired output format (text/binary) for each result set. This could be
added to the NextResult message, but at that point the client doesn't
yet know the number of columns in the

85 matches

Mail list logo