thanks Ding,

I saw your update. Did your run a cloud with this code; i.e. did you
monkey-test it?

On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <y...@ece.utoronto.ca> wrote:
> Oops, sorry I didn't publish my diff. I just published it on review board.
> Thanks for the comment Daan! Please let me know if I further need to improve 
> it.
>
> Ding
>
> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote:
>
>> Ding,
>>
>> I think we can dare to do so in master as it will not see release for
>> a while. We'll just have to be aware of the locations and be on alert
>> for any stacktraces that will pass this list. I would not like to do
>> this on the 4.4 branch even when it is an improvement of code quality
>> as such. It might do things or prevent things from happening that we
>> need done.
>>
>> I don't see a new version of the diff in the review request. Did you
>> 'Update' -> 'Upload Diff'?
>>
>> regards,
>> Daan
>>
>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote:
>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed
>>> Daan's comment on providing more distinctive text messages.
>>>
>>> Sorry that I haven't split them into smaller patches.
>>>
>>> Note in a few cases the original code was like:
>>>         try {
>>>            pstmt = txn.prepareAutoCloseStatement(sql);
>>>            String gmtCutTime =
>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
>>>            pstmt.setString(1, gmtCutTime);
>>>            pstmt.setString(2, gmtCutTime);
>>>
>>>            ResultSet rs = pstmt.executeQuery();
>>>            while (rs.next()) {
>>>                RunningHostCountInfo info = new RunningHostCountInfo();
>>>                info.setDcId(rs.getLong(1));
>>>                info.setHostType(rs.getString(2));
>>>                info.setCount(rs.getInt(3));
>>>
>>>                l.add(info);
>>>                       }
>>>                 } catch (SQLException e) {
>>>                 } catch (Throwable e) {
>>>                 }
>>>
>>> The try block only throws SQLException as checked exception, and this code
>>> would also swallow any unchecked exceptions. I removed the catch (Throwable)
>>> in these cases to avoid potentially swallowing any unexpected runtime
>>> exceptions. Please let me know if this is not desirable so I can further
>>> update.
>>>
>>> Thanks,
>>> Ding
>>>
>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote:
>>>
>>> Thanks all for the quick comments!
>>> If i understand the discussion correctly, I will just change all the added
>>> log printing statements to debug verbosity. I will upload a new patch for
>>> that shortly.
>>>
>>> Now a bit back story: the reason we are doing this is that we recently did
>>> an analysis on many bugs collected from JIRA to understand why today's cloud
>>> system fails. And we found that almost all of the cluster-wide failures are
>>> caused by buggy exception handling, which would often turn a component
>>> failure into a cluster-wide one. One of the common bug pattern is ignoring
>>> some exceptions -- allowing them to propagate and finally turn into 
>>> disaster.
>>> Therefore we built a simple static checking tool just to check some simple
>>> rules for exception handling, such as if an exception is ignored.
>>> Admittedly, it would be much harder to reason about the potential
>>> consequences caused for ignoring a certain exception, that's why without
>>> much more domain knowledge I can only recommend to 1) avoid over-catching an
>>> exception, especially when the handling logic will swallow it, and 2) log
>>> them, as what this patch does.
>>>
>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
>>> particularly suspicious. It might be worthwhile to double check their
>>> correctness if you have time. I am reposting them below.
>>>
>>> Thanks!
>>> Ding
>>>
>>> =========================
>>> Case 1:
>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>>>
>>> 326: try {
>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
>>>  .. .. ..
>>> 373: } catch (Exception e) {
>>> 374: // I really thing we should do a better handling of these exceptions
>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
>>> 376: }
>>>
>>> Comment seems to suggest for a better handling.
>>> =========================
>>> =========================
>>> Case 2:
>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>
>>> 2284: try {
>>> 2285: /*
>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
>>> 2287: * return if propagation return non null
>>> 2288: */
>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
>>> ResourceState.Event.UpdatePassword);
>>> 2290: if (result != null) {
>>> 2291: return result;
>>> 2292: }
>>> 2293:
>>> 2294: doUpdateHostPassword(h.getId());
>>> 2295: } catch (AgentUnavailableException e) {
>>> 2296: }
>>>
>>> Seem from the comment the logic should be fixed.
>>> A similar code snipeet is at:
>>>  Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
>>> =========================
>>>
>>> =========================
>>> Case 3:
>>>
>>>  Line: 184, File:
>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>>>
>>> 174: try
>>> 175: {
>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
>>> 177: if (success) {
>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId());
>>> 179: AutoScaleVmGroupResponse responseObject =
>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
>>> 180: setResponseObject(responseObject);
>>> 181: responseObject.setResponseName(getCommandName());
>>> 182: }
>>> 183: } catch (Exception ex) {
>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween
>>> 185: s_logger.warn("Failed to create autoscale vm group", ex);
>>> 186: }
>>>
>>> The comment, "TODO", seems to suggest for better handling.
>>> =========================
>>> =========================
>>> Case 4:
>>>
>>>  Line: 222, File: "com/cloud/api/ApiDispatcher.java"
>>> This snippet is moved to ParamProcessWorker.java in the trunk.
>>>
>>> 203: try {
>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
>>>     .. ..
>>> 220: } catch (CloudRuntimeException cloudEx) {
>>> 221: s_logger.error("CloudRuntimeException", cloudEx);
>>> 222: // FIXME: Better error message? This only happens if the API command is
>>> not executable, which typically
>>> 223: //means
>>> 224: // there was
>>> 225: // and IllegalAccessException setting one of the parameters.
>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
>>> error executing API command " + cmd.getCommandName().substring(0,
>>> cmd.getCommandName().length() - 8));
>>> 227: }
>>>
>>> The "FIXME" comment seems to suggest for getter error message.
>>> =========================
>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote:
>>>
>>> Ding Yuan,
>>>
>>> Any objections to that?
>>>
>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>>> <alena.prokharc...@citrix.com> wrote:
>>>
>>>
>>>
>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>>>
>>> I think we agree indeed. Doesn't mean we should start this discuss
>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>>> Yuan wants to do a preliminary version of it?
>>>
>>>
>>> Wiki guide would be useful indeed.
>>>
>>>
>>>
>>> In the meantime I don't think that it hurts for the present patch to
>>> do everything in debug and decide about higher levels needed later.
>>>
>>>
>>> Agree.
>>>
>>>
>>>
>>> regards,
>>>
>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>>> <alena.prokharc...@citrix.com> wrote:
>>>
>>> Daan,
>>>
>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>>> under
>>> "to go with it or to indicate passing a certain code path². I¹ve just
>>> noticed that some of them were added with DEBUG, and some with WARN
>>> level,
>>> and wanted to correct that.
>>>
>>> So we should:
>>>
>>> 1) For sure: never print them out in WARN as there is nothing admin
>>> should
>>> do in this case, because the code just handles them by ignoring.
>>> 2) Figure out what would be the correct level to log them with: INFO or
>>> DEBUG
>>>
>>> From ³Logging best practices² articles, I can see that people use INFO
>>> as
>>> a ³storyline² of normal application behavior, and DEBUG for sort of
>>> information that helps you to track down the failure cases scenarios.
>>> To
>>> me, stuff added by Ding, falls under second category. But I might be
>>> wrong
>>> as I don¹t recall on the spot any discussions happening on the debug
>>> topic, from the mailing list.
>>>
>>> -Alena.
>>>
>>>
>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>>>
>>> Alena,
>>>
>>> What I read in your comment is a description of INFO vs WARN. Debug
>>> would be only for outputting stacktraces to go with it or to indicate
>>> passing a certain code path.
>>>
>>> Agree?
>>>
>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>>> <alena.prokharc...@citrix.com> wrote:
>>>
>>>
>>> -----------------------------------------------------------
>>> This is an automatically generated e-mail. To reply, visit:
>>> https://reviews.apache.org/r/19917/#review39324
>>> -----------------------------------------------------------
>>>
>>>
>>> Is there a reason why logs for some exceptions are being logged in
>>> DEBUG mode, and some in WARN? From my point of view, if the code only
>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of
>>> Admins are seeking for WARN statements in the log, and they might be
>>> confused seeing WARN w/o further failure or retry.
>>>
>>> - Alena Prokharchyk
>>>
>>>
>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>>>
>>>
>>> -----------------------------------------------------------
>>> This is an automatically generated e-mail. To reply, visit:
>>> https://reviews.apache.org/r/19917/
>>> -----------------------------------------------------------
>>>
>>> (Updated April 2, 2014, 1:55 p.m.)
>>>
>>>
>>> Review request for cloudstack.
>>>
>>>
>>> Repository: cloudstack-git
>>>
>>>
>>> Description
>>> -------
>>>
>>> This is the patch for JIRA-6242. See
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>>> details.
>>> Thanks!
>>>
>>>
>>> Diffs
>>> -----
>>>
>>>
>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>>> 0d41bc1
>>>
>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>>> Im
>>> pl.java 01508a4
>>>
>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>> 3e088db
>>>
>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>>> y/
>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>>> e42eaf4
>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>>> 34fdca5
>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>>> 58dd916
>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>>> 1f382d6
>>>
>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>>> an
>>> agerImpl.java 6ed1274
>>>
>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>>> ss
>>> Registry.java 83c8a42
>>>
>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>>> ve
>>> rDiscoverer.java 0ad6dc4
>>>
>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>> rC
>>> onnectionPool.java b779085
>>>
>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>> rS
>>> torageProcessor.java e512046
>>>
>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>>> as
>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>>>
>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>>> hu
>>> mbnailHandler.java 06f21d3
>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986
>>>
>>> Diff: https://reviews.apache.org/r/19917/diff/
>>>
>>>
>>> Testing
>>> -------
>>>
>>>
>>> Thanks,
>>>
>>> Ding Yuan
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Daan
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Daan
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Daan
>>>
>>>
>>>
>>
>>
>>
>> --
>> Daan
>>
>



-- 
Daan

Reply via email to