Greg, thank you for your answers. It would be great if you could clarify a few more things.
1) How do you define "instance available to serve a request" in concurrent environment? I suppose this means an instance that is currently serving less than X requests. What is that X? Will it be just a fixed number? Maybe based on current CPU load, memory usage etc. Please give us some details on this. 2) The new pricing calls for some additional controls in request serving priority. Here's an example: I might want the user requests to have maximum latency of 50ms, but I don't mind task queue requests having latency up to 5000ms or even more. Moreover, if there are requests from users and from task queue competing for instances (even if just for a second), it should be possible to make the user requests go first. Anyway, I hope you see that this is something that only matters with the new pricing. Did GAE team put any thought towards this and how feasible do you think it would be for you to add such controls? This would help a lot. 3) I don't think documentation or SLA says anything about the way users's instances are packed into machines -- is it done in a way that the instances guaranteed their share of memory even when they don't use it? 4) How many instances are there per-core on a machine? If there are a lot, the latency for apps can increase just due to OS scheduler having to juggle all those instances, and through no fault of the application author. Thank you, Sergey -- http://self.maluke.com/ -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/yySRUxpQg4gJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
