thanks Ding, I saw your update. Did your run a cloud with this code; i.e. did you monkey-test it?
On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <y...@ece.utoronto.ca> wrote: > Oops, sorry I didn't publish my diff. I just published it on review board. > Thanks for the comment Daan! Please let me know if I further need to improve > it. > > Ding > > On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > >> Ding, >> >> I think we can dare to do so in master as it will not see release for >> a while. We'll just have to be aware of the locations and be on alert >> for any stacktraces that will pass this list. I would not like to do >> this on the 4.4 branch even when it is an improvement of code quality >> as such. It might do things or prevent things from happening that we >> need done. >> >> I don't see a new version of the diff in the review request. Did you >> 'Update' -> 'Upload Diff'? >> >> regards, >> Daan >> >> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: >>> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed >>> Daan's comment on providing more distinctive text messages. >>> >>> Sorry that I haven't split them into smaller patches. >>> >>> Note in a few cases the original code was like: >>> try { >>> pstmt = txn.prepareAutoCloseStatement(sql); >>> String gmtCutTime = >>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); >>> pstmt.setString(1, gmtCutTime); >>> pstmt.setString(2, gmtCutTime); >>> >>> ResultSet rs = pstmt.executeQuery(); >>> while (rs.next()) { >>> RunningHostCountInfo info = new RunningHostCountInfo(); >>> info.setDcId(rs.getLong(1)); >>> info.setHostType(rs.getString(2)); >>> info.setCount(rs.getInt(3)); >>> >>> l.add(info); >>> } >>> } catch (SQLException e) { >>> } catch (Throwable e) { >>> } >>> >>> The try block only throws SQLException as checked exception, and this code >>> would also swallow any unchecked exceptions. I removed the catch (Throwable) >>> in these cases to avoid potentially swallowing any unexpected runtime >>> exceptions. Please let me know if this is not desirable so I can further >>> update. >>> >>> Thanks, >>> Ding >>> >>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote: >>> >>> Thanks all for the quick comments! >>> If i understand the discussion correctly, I will just change all the added >>> log printing statements to debug verbosity. I will upload a new patch for >>> that shortly. >>> >>> Now a bit back story: the reason we are doing this is that we recently did >>> an analysis on many bugs collected from JIRA to understand why today's cloud >>> system fails. And we found that almost all of the cluster-wide failures are >>> caused by buggy exception handling, which would often turn a component >>> failure into a cluster-wide one. One of the common bug pattern is ignoring >>> some exceptions -- allowing them to propagate and finally turn into >>> disaster. >>> Therefore we built a simple static checking tool just to check some simple >>> rules for exception handling, such as if an exception is ignored. >>> Admittedly, it would be much harder to reason about the potential >>> consequences caused for ignoring a certain exception, that's why without >>> much more domain knowledge I can only recommend to 1) avoid over-catching an >>> exception, especially when the handling logic will swallow it, and 2) log >>> them, as what this patch does. >>> >>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are >>> particularly suspicious. It might be worthwhile to double check their >>> correctness if you have time. I am reposting them below. >>> >>> Thanks! >>> Ding >>> >>> ========================= >>> Case 1: >>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" >>> >>> 326: try { >>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw); >>> .. .. .. >>> 373: } catch (Exception e) { >>> 374: // I really thing we should do a better handling of these exceptions >>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); >>> 376: } >>> >>> Comment seems to suggest for a better handling. >>> ========================= >>> ========================= >>> Case 2: >>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" >>> >>> 2284: try { >>> 2285: /* >>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't >>> 2287: * return if propagation return non null >>> 2288: */ >>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), >>> ResourceState.Event.UpdatePassword); >>> 2290: if (result != null) { >>> 2291: return result; >>> 2292: } >>> 2293: >>> 2294: doUpdateHostPassword(h.getId()); >>> 2295: } catch (AgentUnavailableException e) { >>> 2296: } >>> >>> Seem from the comment the logic should be fixed. >>> A similar code snipeet is at: >>> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" >>> ========================= >>> >>> ========================= >>> Case 3: >>> >>> Line: 184, File: >>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java" >>> >>> 174: try >>> 175: { >>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this); >>> 177: if (success) { >>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId()); >>> 179: AutoScaleVmGroupResponse responseObject = >>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); >>> 180: setResponseObject(responseObject); >>> 181: responseObject.setResponseName(getCommandName()); >>> 182: } >>> 183: } catch (Exception ex) { >>> 184: // TODO what will happen if Resource Layer fails in a step inbetween >>> 185: s_logger.warn("Failed to create autoscale vm group", ex); >>> 186: } >>> >>> The comment, "TODO", seems to suggest for better handling. >>> ========================= >>> ========================= >>> Case 4: >>> >>> Line: 222, File: "com/cloud/api/ApiDispatcher.java" >>> This snippet is moved to ParamProcessWorker.java in the trunk. >>> >>> 203: try { >>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); >>> .. .. >>> 220: } catch (CloudRuntimeException cloudEx) { >>> 221: s_logger.error("CloudRuntimeException", cloudEx); >>> 222: // FIXME: Better error message? This only happens if the API command is >>> not executable, which typically >>> 223: //means >>> 224: // there was >>> 225: // and IllegalAccessException setting one of the parameters. >>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal >>> error executing API command " + cmd.getCommandName().substring(0, >>> cmd.getCommandName().length() - 8)); >>> 227: } >>> >>> The "FIXME" comment seems to suggest for getter error message. >>> ========================= >>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >>> >>> Ding Yuan, >>> >>> Any objections to that? >>> >>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk >>> <alena.prokharc...@citrix.com> wrote: >>> >>> >>> >>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>> >>> I think we agree indeed. Doesn't mean we should start this discuss >>> thread or write a arch guideline on the wiki somewhere. Maybe Ding >>> Yuan wants to do a preliminary version of it? >>> >>> >>> Wiki guide would be useful indeed. >>> >>> >>> >>> In the meantime I don't think that it hurts for the present patch to >>> do everything in debug and decide about higher levels needed later. >>> >>> >>> Agree. >>> >>> >>> >>> regards, >>> >>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk >>> <alena.prokharc...@citrix.com> wrote: >>> >>> Daan, >>> >>> Correct me if I¹m wrong, but all of the logging added by Ding, fall >>> under >>> "to go with it or to indicate passing a certain code path². I¹ve just >>> noticed that some of them were added with DEBUG, and some with WARN >>> level, >>> and wanted to correct that. >>> >>> So we should: >>> >>> 1) For sure: never print them out in WARN as there is nothing admin >>> should >>> do in this case, because the code just handles them by ignoring. >>> 2) Figure out what would be the correct level to log them with: INFO or >>> DEBUG >>> >>> From ³Logging best practices² articles, I can see that people use INFO >>> as >>> a ³storyline² of normal application behavior, and DEBUG for sort of >>> information that helps you to track down the failure cases scenarios. >>> To >>> me, stuff added by Ding, falls under second category. But I might be >>> wrong >>> as I don¹t recall on the spot any discussions happening on the debug >>> topic, from the mailing list. >>> >>> -Alena. >>> >>> >>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>> >>> Alena, >>> >>> What I read in your comment is a description of INFO vs WARN. Debug >>> would be only for outputting stacktraces to go with it or to indicate >>> passing a certain code path. >>> >>> Agree? >>> >>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk >>> <alena.prokharc...@citrix.com> wrote: >>> >>> >>> ----------------------------------------------------------- >>> This is an automatically generated e-mail. To reply, visit: >>> https://reviews.apache.org/r/19917/#review39324 >>> ----------------------------------------------------------- >>> >>> >>> Is there a reason why logs for some exceptions are being logged in >>> DEBUG mode, and some in WARN? From my point of view, if the code only >>> catches it and doesn't error out, it should be logged in DEBUG. Lots of >>> Admins are seeking for WARN statements in the log, and they might be >>> confused seeing WARN w/o further failure or retry. >>> >>> - Alena Prokharchyk >>> >>> >>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote: >>> >>> >>> ----------------------------------------------------------- >>> This is an automatically generated e-mail. To reply, visit: >>> https://reviews.apache.org/r/19917/ >>> ----------------------------------------------------------- >>> >>> (Updated April 2, 2014, 1:55 p.m.) >>> >>> >>> Review request for cloudstack. >>> >>> >>> Repository: cloudstack-git >>> >>> >>> Description >>> ------- >>> >>> This is the patch for JIRA-6242. See >>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more >>> details. >>> Thanks! >>> >>> >>> Diffs >>> ----- >>> >>> >>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java >>> 0d41bc1 >>> >>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager >>> Im >>> pl.java 01508a4 >>> >>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java >>> 3e088db >>> >>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit >>> y/ >>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e >>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f >>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d >>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java >>> e42eaf4 >>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java >>> 34fdca5 >>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java >>> 58dd916 >>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 >>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java >>> 1f382d6 >>> >>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM >>> an >>> agerImpl.java 6ed1274 >>> >>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla >>> ss >>> Registry.java 83c8a42 >>> >>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer >>> ve >>> rDiscoverer.java 0ad6dc4 >>> >>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>> rC >>> onnectionPool.java b779085 >>> >>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>> rS >>> torageProcessor.java e512046 >>> >>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat >>> as >>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a >>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba >>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 >>> >>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT >>> hu >>> mbnailHandler.java 06f21d3 >>> utils/src/com/cloud/utils/net/NetUtils.java 6350986 >>> >>> Diff: https://reviews.apache.org/r/19917/diff/ >>> >>> >>> Testing >>> ------- >>> >>> >>> Thanks, >>> >>> Ding Yuan >>> >>> >>> >>> >>> >>> >>> -- >>> Daan >>> >>> >>> >>> >>> >>> -- >>> Daan >>> >>> >>> >>> >>> >>> -- >>> Daan >>> >>> >>> >> >> >> >> -- >> Daan >> > -- Daan