On Thu, 7 Dec 2023 at 20:06, Anthonin Bonnefoy <anthonin.bonne...@datadoghq.com> wrote: > > Hi, > > Thanks for the review! > > > ``` > > +-- Worker can take some additional time to end and report their spans > > +SELECT pg_sleep(0.2); > > + pg_sleep > > +---------- > > + > > +(1 row) > > ``` > > > > Pretty sure this will fail on buildfarm from time to time. Maybe this > > is a case when it is worth using TAP tests? Especially considering the > > support of pg_tracing.notify_channel. > > The sleep was unnecessary as the leader waits for all workers to finish > and report. All workers should have added their spans in the shared buffer > before the leader and I removed the sleep from the test. I haven't seen a > recent > failure on this test since then. > > I did add a TAP test for the notify part since it seems to be the only way > to test it. > > > ``` > > + Only superusers can change this setting. > > + This parameter can only be set at server start. > > ``` > > > > I find this confusing. If the parameter can only be set at server > > start doesn't it mean that whoever has access to postgresql.conf can > > change it? > > The 'only superusers...' part was incorrect. This setting was needed to > compute > the size of the shmem request so modifying it would require a server start. > Though for this parameter, I've realised that I could use the global's > max_parallel_workers > instead, removing the need to have a max_traced_parallel_workers parameter. > > > Perhaps TABLE command should be added to the list, since it's > > basically a syntax sugar for SELECT: > > Added. > > > Obviously pg_tracing.drop_on_full_buffer = on will not work properly > > with pg_tracing.notify_threshold because there is a race condition > > possible. By the time one receives and processes the notification the > > buffer can be already cleaned. > > That's definitely a possibility. Lowering the notification threshold can > lower the chance of the buffer being cleared when handling the notification > but the race can still happen. > > For my use cases, spans should be sent as soon as possible and old spans > are less valuable than new spans but other usage would want the opposite, > thus the parameters to control the behavior. > > > This makes me think that 1) pg_tracing.drop_on_full_buffer = on could > > be not the best default value > > I don't have a strong opinion on the default behaviour, I will switch to > keep buffer on full. > > > 2) As a user I would like a support of ring buffer mode. Even if > > implementing this mode is out of scope of this particular patch, I > > would argue that instead of drop_on_full_buffer boolean parameter we > > need something like: > > > > pg_tracing.buffer_mode = keep_on_full | drop_on_full | ring_buffer | > > perhaps_other_future_modes > > Having a buffer ring was also my first approach but it adds some complexity > and I've switched to the full drop + truncate query file for the first > iteration. > Ring buffer would definitely be a better experience, I've switched the > parameter to an enum with only keep_on_full and drop_on_full for the time > being. > > > Do you think it possible to document how much shared memory does the > > extension consume depending on the parameters? Alternatively maybe > > it's worth providing a SQL function that returns this value? As a user > > I would like to know whether 10 MB, 500 MB or 1.5 GB of shared memory > > are consumed. > > Good point. I've updated the doc to give some approximations with a query > to detail the memory usage. > > > I would argue that ns ( 1 / 1_000_000_000 sec! ) precision is > > redundant and only complicates the interface. This is way below the > > noise floor formed from network delays, process switching by the OS, > > noisy neighbours under the same supervisor, time slowdown by ntpd, > > etc. I don't recall ever needing spans with better than 1 us > > precision. > > > > On top of that if you have `span_start_ns` you should probably rename > > `duration` to `duration_ns` for consistency. I believe it would be > > more convenient to have `duration` in microseconds. Nothing prevents > > us from using fractions of microseconds (e.g. 12.345 us) if necessary. > > I was definitely on the fence on this. I've initially tried to match the > precision provided by query instrumentation but that definitely introduced > a lot of complexity. Also, while trace standard specify that start and end > should be in nanoseconds, it turns out that the nanosecond precision is > generally ignored. > > I've switched to a start+end in TimestampTz, dropping the ns precision. > Tests had to be rewritten to have a stable order with spans having that can > have the same start/end. > > I've also removed some spans: ExecutorStart, ExecutorEnd, PostParse. > Those spans generally had a sub microsecond duration and didn't bring much > value except for debugging pg_tracing itself. To avoid the confusion of > having spans with a 0us duration, removing them altogether seemed better. > ExecutorFinish also has a sub microsecond duration unless it has a nested > query (triggers) so I'm only emitting them in those cases. > > > Negative span ids, for sure, are possible, but I don't recall seeing > > many negative ids in my life. I suggest not to surprise the user > > unless this is really necessary. > > That one is gonna be difficult. Traceid and spanid are 8 bytes so I'm > exporting > them using bigint. PostgreSQL doesn't support unsigned bigint thus the > negative > span ids. An alternative would be to use a bytea but bigint are easier to read > and interact with. > > The latest patch contains changes around the nanosecond precision drop, test > updates to make them more stable and simplification around span management.
One of the test has failed in CFBOT [1] with: -select count(span_id) from pg_tracing_consume_spans group by span_id; - count -------- - 1 - 1 - 1 - 1 - 1 -(5 rows) - --- Cleanup -SET pg_tracing.sample_rate = 0.0; +server closed the connection unexpectedly + This probably means the server terminated abnormally + before or while processing the request. +connection to server was lost More details are available at [2] [1] - https://cirrus-ci.com/task/5876369167482880 [2] - https://api.cirrus-ci.com/v1/artifact/task/5876369167482880/testrun/build/testrun/pg_tracing/regress/regression.diffs