On 02/13/2017 11:50 AM, François Beaulieu wrote:
> 
>> On Feb 13, 2017, at 1:56 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
>>
>> On 02/13/2017 09:04 AM, François Beaulieu wrote:
>>>
>>>> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.kla...@aklaver.com> 
>>>> wrote:
>>>>
>>                                                                     |
>>>>>
>>>>>> 3) Are the first row and the second row in the same partition?
>>>>>
>>>>> Doubtful, the problem occurs several times a day and we only have one 
>>>>> partition a day. Let me check with the above example. What would be the 
>>>>> best way to determine which child a row is in, and the relative position 
>>>>> in the child table?
>>>>
>>>> As to position, maybe ctid though it has caveats:
>>>
>>> The three rows in my example return a ctid of (742,17), (742,18) and 
>>> (742,19) respectively, in their child table. So, probably not at a 
>>> partition boundary.
>>>
>>>>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able 
>>>>> to 100% eliminate the module itself as the cause of the bug. Any 
>>>>> suggestions on how I might go about doing that efficiently?
>>>>
>>>> What does the worker do?
>>>
>>> Sorry, that's my employer’s classified IP. :-)
>>> Does it matter?
>>
>> Only that it makes it harder to give any suggestions on eliminating it as a 
>> source of error if it is a black box.  I don't think, at this point, it is 
>> necessary to see the actual source. If it is possible a high level synopsis 
>> of what it does might be sufficient.
> 
> Suffice it to say the worker uses a read-only connection to the database to 
> receive these notices and to query the table for the matching rows in certain 
> circumstances. It never modifies the database in any way; it only uses this 
> information to act upon a completely different subsystem. I loosely based it 
> on a snippet of code from this very mailing list:
> 
> https://www.postgresql.org/message-id/20050104031937.ga80...@winnie.fuhr.org
> 
>>>> Could it be the module is not dealing with time zones correctly? Though 
>>>> thinking about this that would seem to manifest a problem only around the 
>>>> 7th day boundary. So put this down to thinking aloud.
>>>
>>> No, the partitioning scheme seems to be respecting the timezone properly, 
>>> and my issue is happening every few hours in the middle of the day and 
>>> we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe 
>>> timestamp without timezone assumes the local timezone of the server, which 
>>> is set to UTC anyway.
>>>
>>> Has the schema eliminated your original theory regarding the delaying of 
>>> the generation of the _id? I don’t think that would normally be an issue 
>>> that occurs sporadically and the _id seems to be part of the INSERT, which 
>>> would indicate that, as it should, it’s done generating before my trigger 
>>> is called.
>>
>> I don't see anything that would explain a delay. Still the fact remains that 
>> in most cases the notify captures the _id, but in some cases it does not. 
>> Going back to your OP I realized I missed that the NEW.userfield was also 
>> not coming through. So that seems to confirm that pg_notify() is firing 
>> before it gets access to NEW.*. Having said that I have no idea why?
>>
>> The only thing I can think to do is(untested):
>>
>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>> DECLARE
>>      _newid integer;
>> BEGIN
>>  SELECT NEW._id INTO _newid;
>>  IF _newid IS NULL OR NOT FOUND THEN
>>      RAISE NOTICE 'NEW._id is NULL/NOT FOUND';
>>      pg_sleep(0.1); --Or whatever interval you want.
>>  END IF;
>>  PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id|| ',' || 
>> NEW.userfield);
>>  RETURN new;
>> END;
>> $$ LANGUAGE plpgsql;
>>
>> Not really a solution but it might help determine whether it is a timing 
>> issue. Also this is probably something that should be done on a test server 
>> to be safe.
> 
> Thanks for the tip, I’ll try that in my lab. I want to try to replicate the 
> issue more consistently first, so that my tests after the change will be more 
> conclusive. 
> 
> In the meantime, if anyone has any other suggestions, please don’t hesitate.

The only thing I can come up with is in your test lab once you 
replicate the issue crank up the logging level:

https://www.postgresql.org/docs/9.4/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

log_min_messages

to see if more detail sheds any light.

> 
> 
> Thanks,
> -=François Beaulieu
> SBK Telecom
> 


-- 
Adrian Klaver
adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to