Greg, thank you for your answers.

It would be great if you could clarify a few more things.

1) How do you define "instance available to serve a request" in concurrent 
environment? I suppose this means an instance that is currently serving less 
than X requests. What is that X? Will it be just a fixed number? Maybe based 
on current CPU load, memory usage etc. Please give us some details on this.

2) The new pricing calls for some additional controls in request serving 
priority. Here's an example: I might want the user requests to have maximum 
latency of 50ms, but I don't mind task queue requests having latency up to 
5000ms or even more. Moreover, if there are requests from users and from 
task queue competing for instances (even if just for a second), it should be 
possible to make the user requests go first. Anyway, I hope you see that 
this is something that only matters with the new pricing. Did GAE team put 
any thought towards this and how feasible do you think it would be for you 
to add such controls? This would help a lot.

3) I don't think documentation or SLA says anything about the way users's 
instances are packed into machines -- is it done in a way that the instances 
guaranteed their share of memory even when they don't use it? 

4) How many instances are there per-core on a machine? If there are a lot, 
the latency for apps can increase just due to OS scheduler having to juggle 
all those instances, and through no fault of the application author.

Thank you,
Sergey

--
http://self.maluke.com/

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/yySRUxpQg4gJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to