Yes, in the /_ah/queue/deferred requests, we are almost certainly doing datastore/memcache work and very likely doing urlfetch work (I'm referring to this probabilistically, because we use deferred for many things).
BTW, appid steprep is Python. j On Sep 23, 2:21 pm, "Ikai Lan (Google)" <[email protected]> wrote: > Jason, are you calling URLFetch/datastore/Memcache in that action? > > So far there are no reports of Java applications being impacted. We usually > have a bit more of flexibility when it comes to tearing down and spinning up > Python instances, so I'm wondering if this is related. > > -- > Ikai Lan > Developer Programs Engineer, Google App Engine > Blogger:http://googleappengine.blogspot.com > Reddit:http://www.reddit.com/r/appengine > Twitter:http://twitter.com/app_engine > > > > On Thu, Sep 23, 2010 at 4:17 PM, Jason C <[email protected]> wrote: > > When I refer to bursts of errors, this is an example of what I mean. > > This is a query for any 500-series errors against appid: steprep. Note > > how close together the log entries are (i.e., the bursty-ness). These > > requests operate very quickly when App Engine is performing well. > > > 09-23 01:07PM 42.459 /_ah/queue/deferred 500 29768ms > > 09-23 01:07PM 42.084 /_ah/queue/deferred 500 29485ms > > 09-23 01:07PM 38.017 /_ah/queue/deferred 500 29523ms > > 09-23 01:07PM 33.636 /_ah/queue/deferred 500 29417ms > > 09-23 01:07PM 32.176 /_ah/queue/deferred 500 29787ms > > 09-23 01:07PM 30.812 /_ah/queue/deferred 500 29451ms > > 09-23 01:07PM 25.477 /_ah/queue/deferred 500 29626ms > > 09-23 01:07PM 24.605 /_ah/queue/deferred 500 30025ms > > 09-23 01:07PM 14.422 /mgmt/search/start/ 500 40086ms > > 09-23 01:07PM 24.204 /_ah/queue/deferred 500 29768ms > > 09-23 01:07PM 20.746 /_ah/queue/deferred 500 29968ms > > 09-23 01:07PM 30.383 /exec-reports/queue/increment-all-time-data- > > counter 500 10003ms > > 09-23 01:05PM 47.180 /_ah/queue/deferred 500 30070ms > > 09-23 01:05PM 42.770 /mgmt/search/start/ 500 30053ms > > 09-23 01:04PM 55.080 /_ah/queue/deferred 500 29404ms > > 09-23 01:04PM 49.054 /_ah/queue/deferred 500 29589ms > > > Anything around the 30-s mark is a DeadlineExceededError, anything > > around the 10-s mark is the "Request was aborted after waiting too > > long to attempt to service your request....". > > > These bursts are _NOT_ rare; I can find them at will, including > > similar bursts of the 10-s style: > > > 09-23 12:52PM 07.445 /_ah/queue/deferred 500 10006ms > > 09-23 12:52PM 06.510 /_ah/queue/deferred 500 10016ms > > 09-23 12:52PM 06.303 /_ah/queue/deferred 500 10003ms > > 09-23 12:52PM 05.902 /_ah/queue/deferred 500 10017ms > > 09-23 12:52PM 05.641 /_ah/queue/deferred 500 10017ms > > 09-23 12:52PM 04.872 /_ah/queue/deferred 500 10002ms > > 09-23 12:52PM 04.211 /_ah/queue/deferred 500 10030ms > > 09-23 12:52PM 03.759 /_ah/queue/deferred 500 10029ms > > 09-23 12:52PM 03.652 /_ah/queue/deferred 500 10021ms > > > I am running out of ideas of what I can do at my end to address this > > issue. Does anyone from Google have suggestions? Is this an > > infrastructure / autoscaling problem, or is it a problem at my end? > > > On Sep 23, 11:06 am, Jason C <[email protected]> wrote: > > > We get lots of bursts of loading failures. Most recently: 9.45a to > > > 9.47a log time (2010-09-23). Appid: steprep > > > > We also get lots of ongoing 10-second timeout. Using the logs, it > > > seems to me that our user-facing requests are below the 1000ms > > > threshold, so I don't know why we get so many 10-second timeouts > > > (i.e., the "Request was aborted after waiting too long to attempt to > > > service your request. This may happen sporadically when the App Engine > > > serving cluster is under unexpectedly high or uneven load. If you see > > > this message frequently, please contact the App Engine team." error). > > > > Our QPS varies throughout the day from 1 to around 12, though we have > > > jobs that introduce short spikes of 50-60. > > > > j > > > > On Sep 23, 10:39 am, Eli Jones <[email protected]> wrote: > > > > > My app is extremely lightweight. > > > > > The average CPU usage for a cold start falls in these ranges: > > > > > CPU_MS: 1030 - 1230 > > > > API_CPU_MS: 408 > > > > > The rough outline of the code that runs is: > > > > > 1. Pull credentials form memcache and decrypt. > > > > 2. Connect to gdata service, and get some data. > > > > 3. Create 4 new entities (of same Model type) containing this fetched > > data > > > > and append to a list. > > > > 4. Stick those entities in memcache as protobufs. > > > > 5. Create new entity (for a different Model) and append to the list > > used in > > > > Step 3. > > > > 6. db.put(list) to the datastore. (or keep trying until success) > > > > 7. If necessary, add task to run at next 5 minute mark. > > > > > So.. as long as memcache is working and the encrypted credentials are > > > > there.. there are not any Gets from the datastore. There is just the > > Put > > > > that is putting 4 entities of one model type and 1 entity of another > > model > > > > type all at once. > > > > > timezone from Pytz is used for correct timezone calculations at the > > start of > > > > the day.. to make sure that the initial task starts at the correct > > time. > > > > (At this point I could move the "from pytz import timezone" down to > > just > > > > the function that calculates the initial start delay.. but I avoid > > making > > > > changes to this code unless it's really necessary (timezone was used in > > more > > > > than one place in the past.. so it was imported at the top of the > > file). > > > > > On Thu, Sep 23, 2010 at 12:20 PM, Ikai Lan (Google) < > > > > > [email protected] <ikai.l%[email protected]> < > > ikai.l%[email protected] <ikai.l%[email protected]>>> wrote: > > > > > It sounds like these are impacting Python developers - I haven't > > heard any > > > > > Java app developers chime in yet. > > > > > > What's the ballpark of the QPS you guys are doing? <1qps, 10qps, > > 100qps, > > > > > etc? > > > > > > -- > > > > > Ikai Lan > > > > > Developer Programs Engineer, Google App Engine > > > > > Blogger:http://googleappengine.blogspot.com > > > > > Reddit:http://www.reddit.com/r/appengine > > > > > Twitter:http://twitter.com/app_engine > > > > > > On Thu, Sep 23, 2010 at 12:15 PM, Eli Jones <[email protected]> > > wrote: > > > > > >> Also, a cold start for this task can take as little as 960 ms (where > > it > > > > >> shows the log detail: "This request caused a new process to be > > started for > > > > >> your application..."). > > > > > >> And, the task is doing a URLFetch, so.. I'm sure that is taking up a > > chunk > > > > >> of the running time. > > > > > >> One of my imports is pytz (no django or anything like that though).. > > so > > > > >> that could also be of interest. > > > > > >> On Thu, Sep 23, 2010 at 12:10 PM, Eli Jones <[email protected]> > > wrote: > > > > > >>> My app is in Python. The only thing it does between 6:30 AM PST > > and 1:00 > > > > >>> PM PST is run a single chained task. The task just runs once every > > 5 > > > > >>> minutes. (There are no other backend processes or users). > > > > > >>> Yesterday at 11:00 AM PST and 11:02 AM PST, the task experienced > > the 10 > > > > >>> second, 500 error (with 0 CPU_MS used).. then the task successfully > > retried > > > > >>> at 11:02:55 AM PST. > > > > > >>> There was another one (not mentioned in your list of times for > > yesterday) > > > > >>> at 7:45 AM PST. Failed after 10 seconds with a 500 error and 0 > > CPU_MS used. > > > > >>> The task successfully retried at 7:47 AM PST (for some reason, the > > tasks in > > > > >>> my app wait 120 seconds for the first retry) > > > > > >>> For some reason, the log doesn't contain the message that usually > > goes > > > > >>> with this error (there is no detailed log message): > > > > > >>> "Request was aborted after waiting too long to attempt to service > > your > > > > >>> request. This may happen sporadically when the App Engine serving > > cluster is > > > > >>> under unexpectedly high or uneven load. If you see this message > > frequently, > > > > >>> please contact the App Engine team." > > > > > >>> But, it has the exact same fingerprint. (It just seems that the > > logs are > > > > >>> trimming the detailed messages faster than usual.) > > > > > >>> This chained task runs 79 times each day (this would be 82 requests > > > > >>> counting these 3 errors).. so these errors accounted for 3.66 % of > > the > > > > >>> requests during the day yesterday. > > > > > >>> On Thu, Sep 23, 2010 at 11:19 AM, Ikai Lan (Google) < > > > > >>> [email protected] <ikai.l%[email protected]> < > > ikai.l%[email protected] <ikai.l%[email protected]>>> wrote: > > > > > >>>> Hey guys, > > > > > >>>> We've been tracking latency spikes and intermittent batches of > > aborted > > > > >>>> requests over the last 24 hour period. There were at least 3 > > periods of > > > > >>>> spikes yesterday: > > > > > >>>> ~1100 Pacific (moderate, brief period of high) > > > > >>>> ~1300 Pacific (small) > > > > >>>> ~1530 Pacific (small) > > > > > >>>> Community reports seem to indicate that these are related to > > loading > > > > >>>> requests. If you are being affected, can you report your > > environment, > > > > >>>> language and if these times correlate with your experiences in > > this thread? > > > > >>>> Are there periods that this particular report did not cover? > > Please provide > > > > >>>> measurements, times and application IDs. > > > > > >>>> -- > > > > >>>> Ikai Lan > > > > >>>> Developer Programs Engineer, Google App Engine > > > > >>>> Blogger:http://googleappengine.blogspot.com > > > > >>>> Reddit:http://www.reddit.com/r/appengine > > > > >>>> Twitter:http://twitter.com/app_engine > > > > > >>>> -- > > > > >>>> You received this message because you are subscribed to the Google > > > > >>>> Groups "Google App Engine" group. > > > > >>>> To post to this group, send email to > > [email protected]. > > > > >>>> To unsubscribe from this group, send email to > > > > >>>> [email protected]<google-appengine%2Bunsubscrib > > > > >>>> [email protected]><google-appengine%2Bunsubscrib > > [email protected]> > > > > >>>> . > > > > >>>> For more options, visit this group at > > > > >>>>http://groups.google.com/group/google-appengine?hl=en. > > > > > >> -- > > > > >> You received this message because you are subscribed to the Google > > Groups > > > > >> "Google App Engine" group. > > > > >> To post to this group, send email to... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
