Oops, sorry I didn’t publish my diff. I just published it on review board. Thanks for the comment Daan! Please let me know if I further need to improve it.
Ding On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > Ding, > > I think we can dare to do so in master as it will not see release for > a while. We'll just have to be aware of the locations and be on alert > for any stacktraces that will pass this list. I would not like to do > this on the 4.4 branch even when it is an improvement of code quality > as such. It might do things or prevent things from happening that we > need done. > > I don't see a new version of the diff in the review request. Did you > 'Update' -> 'Upload Diff'? > > regards, > Daan > > On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: >> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed >> Daan's comment on providing more distinctive text messages. >> >> Sorry that I haven't split them into smaller patches. >> >> Note in a few cases the original code was like: >> try { >> pstmt = txn.prepareAutoCloseStatement(sql); >> String gmtCutTime = >> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); >> pstmt.setString(1, gmtCutTime); >> pstmt.setString(2, gmtCutTime); >> >> ResultSet rs = pstmt.executeQuery(); >> while (rs.next()) { >> RunningHostCountInfo info = new RunningHostCountInfo(); >> info.setDcId(rs.getLong(1)); >> info.setHostType(rs.getString(2)); >> info.setCount(rs.getInt(3)); >> >> l.add(info); >> } >> } catch (SQLException e) { >> } catch (Throwable e) { >> } >> >> The try block only throws SQLException as checked exception, and this code >> would also swallow any unchecked exceptions. I removed the catch (Throwable) >> in these cases to avoid potentially swallowing any unexpected runtime >> exceptions. Please let me know if this is not desirable so I can further >> update. >> >> Thanks, >> Ding >> >> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote: >> >> Thanks all for the quick comments! >> If i understand the discussion correctly, I will just change all the added >> log printing statements to debug verbosity. I will upload a new patch for >> that shortly. >> >> Now a bit back story: the reason we are doing this is that we recently did >> an analysis on many bugs collected from JIRA to understand why today's cloud >> system fails. And we found that almost all of the cluster-wide failures are >> caused by buggy exception handling, which would often turn a component >> failure into a cluster-wide one. One of the common bug pattern is ignoring >> some exceptions -- allowing them to propagate and finally turn into disaster. >> Therefore we built a simple static checking tool just to check some simple >> rules for exception handling, such as if an exception is ignored. >> Admittedly, it would be much harder to reason about the potential >> consequences caused for ignoring a certain exception, that's why without >> much more domain knowledge I can only recommend to 1) avoid over-catching an >> exception, especially when the handling logic will swallow it, and 2) log >> them, as what this patch does. >> >> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are >> particularly suspicious. It might be worthwhile to double check their >> correctness if you have time. I am reposting them below. >> >> Thanks! >> Ding >> >> ========================= >> Case 1: >> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" >> >> 326: try { >> 327: String myIp = getGreEndpointIP(dest.getHost(), nw); >> .. .. .. >> 373: } catch (Exception e) { >> 374: // I really thing we should do a better handling of these exceptions >> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); >> 376: } >> >> Comment seems to suggest for a better handling. >> ========================= >> ========================= >> Case 2: >> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" >> >> 2284: try { >> 2285: /* >> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't >> 2287: * return if propagation return non null >> 2288: */ >> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), >> ResourceState.Event.UpdatePassword); >> 2290: if (result != null) { >> 2291: return result; >> 2292: } >> 2293: >> 2294: doUpdateHostPassword(h.getId()); >> 2295: } catch (AgentUnavailableException e) { >> 2296: } >> >> Seem from the comment the logic should be fixed. >> A similar code snipeet is at: >> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" >> ========================= >> >> ========================= >> Case 3: >> >> Line: 184, File: >> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java" >> >> 174: try >> 175: { >> 176: success = _autoScaleService.configureAutoScaleVmGroup(this); >> 177: if (success) { >> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId()); >> 179: AutoScaleVmGroupResponse responseObject = >> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); >> 180: setResponseObject(responseObject); >> 181: responseObject.setResponseName(getCommandName()); >> 182: } >> 183: } catch (Exception ex) { >> 184: // TODO what will happen if Resource Layer fails in a step inbetween >> 185: s_logger.warn("Failed to create autoscale vm group", ex); >> 186: } >> >> The comment, "TODO", seems to suggest for better handling. >> ========================= >> ========================= >> Case 4: >> >> Line: 222, File: "com/cloud/api/ApiDispatcher.java" >> This snippet is moved to ParamProcessWorker.java in the trunk. >> >> 203: try { >> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); >> .. .. >> 220: } catch (CloudRuntimeException cloudEx) { >> 221: s_logger.error("CloudRuntimeException", cloudEx); >> 222: // FIXME: Better error message? This only happens if the API command is >> not executable, which typically >> 223: //means >> 224: // there was >> 225: // and IllegalAccessException setting one of the parameters. >> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal >> error executing API command " + cmd.getCommandName().substring(0, >> cmd.getCommandName().length() - 8)); >> 227: } >> >> The "FIXME" comment seems to suggest for getter error message. >> ========================= >> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >> >> Ding Yuan, >> >> Any objections to that? >> >> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk >> <alena.prokharc...@citrix.com> wrote: >> >> >> >> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >> >> I think we agree indeed. Doesn't mean we should start this discuss >> thread or write a arch guideline on the wiki somewhere. Maybe Ding >> Yuan wants to do a preliminary version of it? >> >> >> Wiki guide would be useful indeed. >> >> >> >> In the meantime I don't think that it hurts for the present patch to >> do everything in debug and decide about higher levels needed later. >> >> >> Agree. >> >> >> >> regards, >> >> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk >> <alena.prokharc...@citrix.com> wrote: >> >> Daan, >> >> Correct me if I¹m wrong, but all of the logging added by Ding, fall >> under >> "to go with it or to indicate passing a certain code path². I¹ve just >> noticed that some of them were added with DEBUG, and some with WARN >> level, >> and wanted to correct that. >> >> So we should: >> >> 1) For sure: never print them out in WARN as there is nothing admin >> should >> do in this case, because the code just handles them by ignoring. >> 2) Figure out what would be the correct level to log them with: INFO or >> DEBUG >> >> From ³Logging best practices² articles, I can see that people use INFO >> as >> a ³storyline² of normal application behavior, and DEBUG for sort of >> information that helps you to track down the failure cases scenarios. >> To >> me, stuff added by Ding, falls under second category. But I might be >> wrong >> as I don¹t recall on the spot any discussions happening on the debug >> topic, from the mailing list. >> >> -Alena. >> >> >> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >> >> Alena, >> >> What I read in your comment is a description of INFO vs WARN. Debug >> would be only for outputting stacktraces to go with it or to indicate >> passing a certain code path. >> >> Agree? >> >> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk >> <alena.prokharc...@citrix.com> wrote: >> >> >> ----------------------------------------------------------- >> This is an automatically generated e-mail. To reply, visit: >> https://reviews.apache.org/r/19917/#review39324 >> ----------------------------------------------------------- >> >> >> Is there a reason why logs for some exceptions are being logged in >> DEBUG mode, and some in WARN? From my point of view, if the code only >> catches it and doesn't error out, it should be logged in DEBUG. Lots of >> Admins are seeking for WARN statements in the log, and they might be >> confused seeing WARN w/o further failure or retry. >> >> - Alena Prokharchyk >> >> >> On April 2, 2014, 1:55 p.m., Ding Yuan wrote: >> >> >> ----------------------------------------------------------- >> This is an automatically generated e-mail. To reply, visit: >> https://reviews.apache.org/r/19917/ >> ----------------------------------------------------------- >> >> (Updated April 2, 2014, 1:55 p.m.) >> >> >> Review request for cloudstack. >> >> >> Repository: cloudstack-git >> >> >> Description >> ------- >> >> This is the patch for JIRA-6242. See >> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more >> details. >> Thanks! >> >> >> Diffs >> ----- >> >> >> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java >> 0d41bc1 >> >> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager >> Im >> pl.java 01508a4 >> >> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java >> 3e088db >> >> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit >> y/ >> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e >> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f >> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d >> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java >> e42eaf4 >> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java >> 34fdca5 >> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java >> 58dd916 >> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 >> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java >> 1f382d6 >> >> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM >> an >> agerImpl.java 6ed1274 >> >> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla >> ss >> Registry.java 83c8a42 >> >> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer >> ve >> rDiscoverer.java 0ad6dc4 >> >> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >> rC >> onnectionPool.java b779085 >> >> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >> rS >> torageProcessor.java e512046 >> >> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat >> as >> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a >> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba >> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 >> >> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT >> hu >> mbnailHandler.java 06f21d3 >> utils/src/com/cloud/utils/net/NetUtils.java 6350986 >> >> Diff: https://reviews.apache.org/r/19917/diff/ >> >> >> Testing >> ------- >> >> >> Thanks, >> >> Ding Yuan >> >> >> >> >> >> >> -- >> Daan >> >> >> >> >> >> -- >> Daan >> >> >> >> >> >> -- >> Daan >> >> >> > > > > -- > Daan >