Hi All, I was alerted to this problem recently and it's something that affects developers so I want to bring it up. It is a design principle in CloudStack that we do not make agent calls within database transactions. The reason is because when you make a call to an external system, there's no guarantee on how long the call takes or even whether the call returns. When a call takes a long time, several bad things can happen: - The MySQL DB Connection held opened due to the DB transaction goes into idle. Eventually, a timeout in MySQL hits and the connection gets severed and the transaction is rolled back. By default, this timeout is 45 seconds but can be changed via a parameter in my.cnf. So it's problem that the agent call completes just fine but the DB transaction rolls back and changes are undone. - The rows locked in that transaction before the remote agent call could be holding up other foreign key checks into the table. MySQL runs foreign key checks in transactions to make sure the data modification and the checks are done atomically. Therefore, these checks must wait for other transactions to complete. Hence, an agent call that takes sometime can severely slow down the system, particularly under scale.
We have two solutions to this: - Drive agent interactions with states. There are many examples of this in VM, Volume, etc. - When the above cannot be done, acquire a lock in the lock table via a DAO method call. Locks do not maintain DB transactions and therefore will not run into this problem. However, you are responsible for releasing locks. It used to be that if you forget to release the locks, the @DB annotation automatically releases locks once it went out of the scope and asserts to alert the developer. However, the @DB annotation has been removed in the Spring work so I'm not sure if it's still done. This is a tough problem to solve because 1. It usually works just fine during functional testing. During scale testing, this problem surfaces and often in unexpected places due to the foreign key check problem. 2. For developers, it is difficult for them to know if a method that they're calling within a transaction ends up in an agent call. There is an assert in AgentManager to ensure that there are no db transactions before making a agent call. Apparently, since the conversion to Maven, no one actually runs with assert on any more. Due to that, this design principle has been lost in CloudStack and we're finding more and more calls being made in DB transactions. To counter that, I decided to add a global parameter that turns the assert to an actual exception. It is advised that all developers set this global parameter, check.txn.before.sending.agent.commands, during their own testing to make sure it doesn't call agent calls in transactions. --Alex