Hi George,

It's definitely not the code or our app's startup process as far as I'm 
aware because this issue is before our docker image is even pulled or any 
code of ours executed. The logs should show that clearly.

The change I applied as per your suggested relating to the health checks 
did not involve any code change whatsoever and yet the VM starts normally 
now. The only difference being that I now see the new health check requests 
in the logs during the startup process.

I'll continue testing and reply with any additional findings.

I'd be grateful if you could shed some more light on the incident in 
question as I still don't have an answer as to what happened.



On Wednesday, February 28, 2018 at 9:40:49 PM UTC, George (Cloud Platform 
Support) wrote:
>
> Hi Karl, 
>
> If your app starts up without issues, the health checks would not matter. 
> There are issues with your app's start up process, that you need to 
> address. For coding-related issues, you are at an advantage posting to 
> stackoverflow or similar forums, where you can get expert help from active 
> programmers. 
>
> On 28 February 2018 at 16:23, Karl Tinawi <[email protected] <javascript:>> 
> wrote:
>
>> Hi George,
>>
>> I was able to do a test deployment tonight by defining the new health 
>> checks as you recommended.
>>
>> Before I continue - my test for this based on performing a deployment as 
>> we see the exact same behaviour there with the VM starting up and crashing 
>> as we did with the incident on Monday.
>>
>> The good news is this has seemingly completely resolved our deployment 
>> issue - in that they are once again successful in a reasonable amount of 
>> time, rather than timing out and failing because of the aforementioned. So 
>> at this point I'm semi confident that it has also resolved the issue we 
>> experienced on Monday when the VM was restarted and couldn't start up 
>> again. Difficult to prove this one currently from my side.
>>
>> I replaced the legacy health_check block in our .yaml file with the 
>> following:-
>>
>> liveness_check:
>>     path: "/_ah/health"
>>     initial_delay_sec: 300
>>     check_interval_sec: 5
>>     timeout_sec: 5
>>     failure_threshold: 3
>>     success_threshold: 1
>>
>> readiness_check:
>>     path: "/login"
>>     app_start_timeout_sec: 300
>>     check_interval_sec: 30
>>     timeout_sec: 5
>>     failure_threshold: 3
>>     success_threshold: 1
>>
>>
>> The most obvious question I have at this point, is why? Why would this 
>> resolve it the issue? I can only guess that this could be related to the 
>> new style health/liveness checks being enabled by default but we had not 
>> executed:
>>
>> gcloud beta app update --split-health-checks --project [YOUR_PROJECT_ID]
>>
>>
>> or provided the liveness_check/readiness_check blocks in our yaml file? 
>> I've only just learnt about these new updated health checks here 
>> <https://cloud.google.com/appengine/docs/flexible/php/configuring-your-app-with-app-yaml#configuring_supervisord_in_the_php_runtime>
>>  as 
>> it's not something we keep up-to-date with once we have a desired 
>> configuration so am concerned that there was a backwards compatibility 
>> issue here.
>>
>> I'm performing a couple more deployments to satisfy myself that this is 
>> not a fluke.
>>
>> As a side question I see these entries in our logs now since activating 
>> the new health checks:
>>
>>
>> <https://lh3.googleusercontent.com/-yUuumRDccTA/WpcdBRBtY7I/AAAAAAAAABI/I_2N-EwYdWIzv6RJKaLcnxRJdyVoQs7IACLcBGAs/s1600/readiness_check.png>
>>
>>
>> <https://lh3.googleusercontent.com/-1q52sNXnYFI/WpcdFCILrbI/AAAAAAAAABM/NZrQfaazq0cwAMCYbvujUfE1l9J6iZynwCLcBGAs/s1600/liveness_check.png>
>>
>> These don't seem to be obeying the configuration I had defined (as per 
>> above code snippets). Most notably the path and interval?
>>
>> I'd like to learn if I'm doing anything wrong here or if there is an 
>> explanation.
>>
>> Many thanks again and looking forward to hearing from you.
>>
>> Karl
>>
>>
>>
>> On Wednesday, February 28, 2018 at 8:12:09 PM UTC, Karl Tinawi wrote:
>>>
>>> Hi George,
>>>
>>> Yes that's correct - it's happened once outside of deployments.
>>>
>>> To answer your questions sir:
>>>
>>>    - We require a custom PHP installation in order to make use of 
>>>    modules that are missing from Google's offering. I've not checked the 
>>>    latest list of extension but it may be that we may be able to move back 
>>> to 
>>>    using the standard PHP image so I'll check this for sure.
>>>    - Scaling is another challenge that we're looking and we're 
>>>    certainly aware that we need to move to auto scaling for contingency 
>>> etc...
>>>    - I'll test configuring the readiness check and report back if we 
>>>    notice any difference in behaviour.
>>>
>>> Were the logs helpful? I'd be grateful if you could shed some light on 
>>> the investigation your end. This is the first time we've noticed an issue 
>>> such as this during the maintenance process, which should be innocuous and 
>>> invisible to us.
>>>
>>> At this point I'm unsure if the issues we face during deployments are 
>>> related to the incident that happened with our running app, which continue 
>>> to occur daily. It's worth noting that the behaviour of the VM is identical 
>>> (in the way of the abrupt restarts as it's trying to boot). I may look at 
>>> trying a test deployment using another image and seeing if that helps.
>>>
>>>
>>> Many thanks again,
>>>
>>> Karl
>>>
>>>
>>> On Wednesday, February 28, 2018 at 12:52:06 AM UTC, George (Cloud 
>>> Platform Support) wrote:
>>>>
>>>> Hello Karl, 
>>>>
>>>> You seem to indicate that the outage is a one-time event, and that 
>>>> there is no other similar occurrence as yet. If this is so, to prevent 
>>>> similar unwanted events in future, you may configure your app for health 
>>>> checks, in detail. For reference, the "Configuring your App with app.yaml" 
>>>> should prove of great help. In your app.yaml, you can specify either 
>>>> liveness check (choosing appropriate parameter values): 
>>>>
>>>> liveness_check:
>>>>   path: "/liveness_check"
>>>>   check_interval_sec: 30
>>>>   timeout_sec: 4
>>>>   failure_threshold: 2
>>>>   success_threshold: 2 
>>>>
>>>> or readiness check: 
>>>>
>>>> readiness_check:
>>>>   path: "/readiness_check"
>>>>   check_interval_sec: 5
>>>>   timeout_sec: 4
>>>>   failure_threshold: 2
>>>>   success_threshold: 2
>>>>   app_start_timeout_sec: 300
>>>>
>>>> It is worthwhile noting that the usual way of specifying PHP for you 
>>>> app is: 
>>>>
>>>> runtime: php  //This setting is required. It is the name of the App 
>>>> Engine language runtime used by this application. To specify PHP, use php
>>>> env: flex
>>>>
>>>> You app uses: runtime: custom , by contrast. 
>>>>
>>>> You may also switch to automatic scaling from manual, and one only 
>>>> instance. If this makes a difference in your app's behavior, the 
>>>> information would help us with debugging. 
>>>>
>>>>
>>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Google App Engine" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/google-appengine/5mKOn6RzRFY/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/google-appengine.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/google-appengine/fe551f25-65ef-4eef-a1fd-7e53e44f0f06%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/google-appengine/fe551f25-65ef-4eef-a1fd-7e53e44f0f06%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/b3234c4f-f06b-4873-8870-05191bbc6e1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
  • [google-appengine]... Karl Tinawi
    • [google-appen... 'George (Cloud Platform Support)' via Google App Engine
      • [google-a... 'George (Cloud Platform Support)' via Google App Engine
        • [goog... Karl Tinawi
          • [... Karl Tinawi
            • ... 'George Suceveanu' via Google App Engine
              • ... Karl Tinawi
                • ... Karl Tinawi
                • ... Karl Tinawi
                • ... 'George (Cloud Platform Support)' via Google App Engine
                • ... Karl Tinawi
                • ... Karl Tinawi
                • ... 'George (Cloud Platform Support)' via Google App Engine
                • ... Karl Tinawi
                • ... 'George (Cloud Platform Support)' via Google App Engine
          • [... Karl Tinawi
            • ... 'George (Cloud Platform Support)' via Google App Engine

Reply via email to