On 02/13/2017 11:50 AM, François Beaulieu wrote: > >> On Feb 13, 2017, at 1:56 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote: >> >> On 02/13/2017 09:04 AM, François Beaulieu wrote: >>> >>>> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.kla...@aklaver.com> >>>> wrote: >>>> >> | >>>>> >>>>>> 3) Are the first row and the second row in the same partition? >>>>> >>>>> Doubtful, the problem occurs several times a day and we only have one >>>>> partition a day. Let me check with the above example. What would be the >>>>> best way to determine which child a row is in, and the relative position >>>>> in the child table? >>>> >>>> As to position, maybe ctid though it has caveats: >>> >>> The three rows in my example return a ctid of (742,17), (742,18) and >>> (742,19) respectively, in their child table. So, probably not at a >>> partition boundary. >>> >>>>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able >>>>> to 100% eliminate the module itself as the cause of the bug. Any >>>>> suggestions on how I might go about doing that efficiently? >>>> >>>> What does the worker do? >>> >>> Sorry, that's my employer’s classified IP. :-) >>> Does it matter? >> >> Only that it makes it harder to give any suggestions on eliminating it as a >> source of error if it is a black box. I don't think, at this point, it is >> necessary to see the actual source. If it is possible a high level synopsis >> of what it does might be sufficient. > > Suffice it to say the worker uses a read-only connection to the database to > receive these notices and to query the table for the matching rows in certain > circumstances. It never modifies the database in any way; it only uses this > information to act upon a completely different subsystem. I loosely based it > on a snippet of code from this very mailing list: > > https://www.postgresql.org/message-id/20050104031937.ga80...@winnie.fuhr.org > >>>> Could it be the module is not dealing with time zones correctly? Though >>>> thinking about this that would seem to manifest a problem only around the >>>> 7th day boundary. So put this down to thinking aloud. >>> >>> No, the partitioning scheme seems to be respecting the timezone properly, >>> and my issue is happening every few hours in the middle of the day and >>> we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe >>> timestamp without timezone assumes the local timezone of the server, which >>> is set to UTC anyway. >>> >>> Has the schema eliminated your original theory regarding the delaying of >>> the generation of the _id? I don’t think that would normally be an issue >>> that occurs sporadically and the _id seems to be part of the INSERT, which >>> would indicate that, as it should, it’s done generating before my trigger >>> is called. >> >> I don't see anything that would explain a delay. Still the fact remains that >> in most cases the notify captures the _id, but in some cases it does not. >> Going back to your OP I realized I missed that the NEW.userfield was also >> not coming through. So that seems to confirm that pg_notify() is firing >> before it gets access to NEW.*. Having said that I have no idea why? >> >> The only thing I can think to do is(untested): >> >> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$ >> DECLARE >> _newid integer; >> BEGIN >> SELECT NEW._id INTO _newid; >> IF _newid IS NULL OR NOT FOUND THEN >> RAISE NOTICE 'NEW._id is NULL/NOT FOUND'; >> pg_sleep(0.1); --Or whatever interval you want. >> END IF; >> PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id|| ',' || >> NEW.userfield); >> RETURN new; >> END; >> $$ LANGUAGE plpgsql; >> >> Not really a solution but it might help determine whether it is a timing >> issue. Also this is probably something that should be done on a test server >> to be safe. > > Thanks for the tip, I’ll try that in my lab. I want to try to replicate the > issue more consistently first, so that my tests after the change will be more > conclusive. > > In the meantime, if anyone has any other suggestions, please don’t hesitate.
The only thing I can come up with is in your test lab once you replicate the issue crank up the logging level: https://www.postgresql.org/docs/9.4/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS log_min_messages to see if more detail sheds any light. > > > Thanks, > -=François Beaulieu > SBK Telecom > -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general