H Ding, I didn't mean for you to test it in a certain way. I was just curious about what you did with the patch before you submitted it. Did you start a cloud instance with it and try to hit any of the log messages for instance. You explained the background of your effort and I am curious as to how it satisfied your objectives.
I will let the patch rest a few days to see if we get any more reactions and apply it from Denver next week. On Sat, Apr 5, 2014 at 2:03 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: > HI Daan, > I am not sure exactly how to monkey-test cloudstack, what I did was to do > $ mvn test > which shows "BUILD SUCCESS". I also did this: > $ mvn clean install -P systemvm,developer > which also succeeded. > > Is that what you mean? If not, please let me know what to do and I will > further test it. > I will also work on assigning the proper reviewers now. > > Thanks, > Ding > > On Apr 3, 2014, at 12:04 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > >> thanks Ding, >> >> I saw your update. Did your run a cloud with this code; i.e. did you >> monkey-test it? >> >> On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <y...@ece.utoronto.ca> wrote: >>> Oops, sorry I didn't publish my diff. I just published it on review board. >>> Thanks for the comment Daan! Please let me know if I further need to >>> improve it. >>> >>> Ding >>> >>> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >>> >>>> Ding, >>>> >>>> I think we can dare to do so in master as it will not see release for >>>> a while. We'll just have to be aware of the locations and be on alert >>>> for any stacktraces that will pass this list. I would not like to do >>>> this on the 4.4 branch even when it is an improvement of code quality >>>> as such. It might do things or prevent things from happening that we >>>> need done. >>>> >>>> I don't see a new version of the diff in the review request. Did you >>>> 'Update' -> 'Upload Diff'? >>>> >>>> regards, >>>> Daan >>>> >>>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: >>>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and >>>>> addressed >>>>> Daan's comment on providing more distinctive text messages. >>>>> >>>>> Sorry that I haven't split them into smaller patches. >>>>> >>>>> Note in a few cases the original code was like: >>>>> try { >>>>> pstmt = txn.prepareAutoCloseStatement(sql); >>>>> String gmtCutTime = >>>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); >>>>> pstmt.setString(1, gmtCutTime); >>>>> pstmt.setString(2, gmtCutTime); >>>>> >>>>> ResultSet rs = pstmt.executeQuery(); >>>>> while (rs.next()) { >>>>> RunningHostCountInfo info = new RunningHostCountInfo(); >>>>> info.setDcId(rs.getLong(1)); >>>>> info.setHostType(rs.getString(2)); >>>>> info.setCount(rs.getInt(3)); >>>>> >>>>> l.add(info); >>>>> } >>>>> } catch (SQLException e) { >>>>> } catch (Throwable e) { >>>>> } >>>>> >>>>> The try block only throws SQLException as checked exception, and this code >>>>> would also swallow any unchecked exceptions. I removed the catch >>>>> (Throwable) >>>>> in these cases to avoid potentially swallowing any unexpected runtime >>>>> exceptions. Please let me know if this is not desirable so I can further >>>>> update. >>>>> >>>>> Thanks, >>>>> Ding >>>>> >>>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote: >>>>> >>>>> Thanks all for the quick comments! >>>>> If i understand the discussion correctly, I will just change all the added >>>>> log printing statements to debug verbosity. I will upload a new patch for >>>>> that shortly. >>>>> >>>>> Now a bit back story: the reason we are doing this is that we recently did >>>>> an analysis on many bugs collected from JIRA to understand why today's >>>>> cloud >>>>> system fails. And we found that almost all of the cluster-wide failures >>>>> are >>>>> caused by buggy exception handling, which would often turn a component >>>>> failure into a cluster-wide one. One of the common bug pattern is ignoring >>>>> some exceptions -- allowing them to propagate and finally turn into >>>>> disaster. >>>>> Therefore we built a simple static checking tool just to check some simple >>>>> rules for exception handling, such as if an exception is ignored. >>>>> Admittedly, it would be much harder to reason about the potential >>>>> consequences caused for ignoring a certain exception, that's why without >>>>> much more domain knowledge I can only recommend to 1) avoid over-catching >>>>> an >>>>> exception, especially when the handling logic will swallow it, and 2) log >>>>> them, as what this patch does. >>>>> >>>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are >>>>> particularly suspicious. It might be worthwhile to double check their >>>>> correctness if you have time. I am reposting them below. >>>>> >>>>> Thanks! >>>>> Ding >>>>> >>>>> ========================= >>>>> Case 1: >>>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" >>>>> >>>>> 326: try { >>>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw); >>>>> .. .. .. >>>>> 373: } catch (Exception e) { >>>>> 374: // I really thing we should do a better handling of these exceptions >>>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); >>>>> 376: } >>>>> >>>>> Comment seems to suggest for a better handling. >>>>> ========================= >>>>> ========================= >>>>> Case 2: >>>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" >>>>> >>>>> 2284: try { >>>>> 2285: /* >>>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't >>>>> 2287: * return if propagation return non null >>>>> 2288: */ >>>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), >>>>> ResourceState.Event.UpdatePassword); >>>>> 2290: if (result != null) { >>>>> 2291: return result; >>>>> 2292: } >>>>> 2293: >>>>> 2294: doUpdateHostPassword(h.getId()); >>>>> 2295: } catch (AgentUnavailableException e) { >>>>> 2296: } >>>>> >>>>> Seem from the comment the logic should be fixed. >>>>> A similar code snipeet is at: >>>>> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" >>>>> ========================= >>>>> >>>>> ========================= >>>>> Case 3: >>>>> >>>>> Line: 184, File: >>>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java" >>>>> >>>>> 174: try >>>>> 175: { >>>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this); >>>>> 177: if (success) { >>>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId()); >>>>> 179: AutoScaleVmGroupResponse responseObject = >>>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); >>>>> 180: setResponseObject(responseObject); >>>>> 181: responseObject.setResponseName(getCommandName()); >>>>> 182: } >>>>> 183: } catch (Exception ex) { >>>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween >>>>> 185: s_logger.warn("Failed to create autoscale vm group", ex); >>>>> 186: } >>>>> >>>>> The comment, "TODO", seems to suggest for better handling. >>>>> ========================= >>>>> ========================= >>>>> Case 4: >>>>> >>>>> Line: 222, File: "com/cloud/api/ApiDispatcher.java" >>>>> This snippet is moved to ParamProcessWorker.java in the trunk. >>>>> >>>>> 203: try { >>>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); >>>>> .. .. >>>>> 220: } catch (CloudRuntimeException cloudEx) { >>>>> 221: s_logger.error("CloudRuntimeException", cloudEx); >>>>> 222: // FIXME: Better error message? This only happens if the API command >>>>> is >>>>> not executable, which typically >>>>> 223: //means >>>>> 224: // there was >>>>> 225: // and IllegalAccessException setting one of the parameters. >>>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal >>>>> error executing API command " + cmd.getCommandName().substring(0, >>>>> cmd.getCommandName().length() - 8)); >>>>> 227: } >>>>> >>>>> The "FIXME" comment seems to suggest for getter error message. >>>>> ========================= >>>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >>>>> >>>>> Ding Yuan, >>>>> >>>>> Any objections to that? >>>>> >>>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk >>>>> <alena.prokharc...@citrix.com> wrote: >>>>> >>>>> >>>>> >>>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>>> >>>>> I think we agree indeed. Doesn't mean we should start this discuss >>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding >>>>> Yuan wants to do a preliminary version of it? >>>>> >>>>> >>>>> Wiki guide would be useful indeed. >>>>> >>>>> >>>>> >>>>> In the meantime I don't think that it hurts for the present patch to >>>>> do everything in debug and decide about higher levels needed later. >>>>> >>>>> >>>>> Agree. >>>>> >>>>> >>>>> >>>>> regards, >>>>> >>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk >>>>> <alena.prokharc...@citrix.com> wrote: >>>>> >>>>> Daan, >>>>> >>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall >>>>> under >>>>> "to go with it or to indicate passing a certain code path². I¹ve just >>>>> noticed that some of them were added with DEBUG, and some with WARN >>>>> level, >>>>> and wanted to correct that. >>>>> >>>>> So we should: >>>>> >>>>> 1) For sure: never print them out in WARN as there is nothing admin >>>>> should >>>>> do in this case, because the code just handles them by ignoring. >>>>> 2) Figure out what would be the correct level to log them with: INFO or >>>>> DEBUG >>>>> >>>>> From ³Logging best practices² articles, I can see that people use INFO >>>>> as >>>>> a ³storyline² of normal application behavior, and DEBUG for sort of >>>>> information that helps you to track down the failure cases scenarios. >>>>> To >>>>> me, stuff added by Ding, falls under second category. But I might be >>>>> wrong >>>>> as I don¹t recall on the spot any discussions happening on the debug >>>>> topic, from the mailing list. >>>>> >>>>> -Alena. >>>>> >>>>> >>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>>> >>>>> Alena, >>>>> >>>>> What I read in your comment is a description of INFO vs WARN. Debug >>>>> would be only for outputting stacktraces to go with it or to indicate >>>>> passing a certain code path. >>>>> >>>>> Agree? >>>>> >>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk >>>>> <alena.prokharc...@citrix.com> wrote: >>>>> >>>>> >>>>> ----------------------------------------------------------- >>>>> This is an automatically generated e-mail. To reply, visit: >>>>> https://reviews.apache.org/r/19917/#review39324 >>>>> ----------------------------------------------------------- >>>>> >>>>> >>>>> Is there a reason why logs for some exceptions are being logged in >>>>> DEBUG mode, and some in WARN? From my point of view, if the code only >>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of >>>>> Admins are seeking for WARN statements in the log, and they might be >>>>> confused seeing WARN w/o further failure or retry. >>>>> >>>>> - Alena Prokharchyk >>>>> >>>>> >>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote: >>>>> >>>>> >>>>> ----------------------------------------------------------- >>>>> This is an automatically generated e-mail. To reply, visit: >>>>> https://reviews.apache.org/r/19917/ >>>>> ----------------------------------------------------------- >>>>> >>>>> (Updated April 2, 2014, 1:55 p.m.) >>>>> >>>>> >>>>> Review request for cloudstack. >>>>> >>>>> >>>>> Repository: cloudstack-git >>>>> >>>>> >>>>> Description >>>>> ------- >>>>> >>>>> This is the patch for JIRA-6242. See >>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more >>>>> details. >>>>> Thanks! >>>>> >>>>> >>>>> Diffs >>>>> ----- >>>>> >>>>> >>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java >>>>> 0d41bc1 >>>>> >>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager >>>>> Im >>>>> pl.java 01508a4 >>>>> >>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>>> 3e088db >>>>> >>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit >>>>> y/ >>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e >>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f >>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d >>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java >>>>> e42eaf4 >>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java >>>>> 34fdca5 >>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java >>>>> 58dd916 >>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 >>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java >>>>> 1f382d6 >>>>> >>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM >>>>> an >>>>> agerImpl.java 6ed1274 >>>>> >>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla >>>>> ss >>>>> Registry.java 83c8a42 >>>>> >>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer >>>>> ve >>>>> rDiscoverer.java 0ad6dc4 >>>>> >>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>>> rC >>>>> onnectionPool.java b779085 >>>>> >>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>>> rS >>>>> torageProcessor.java e512046 >>>>> >>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat >>>>> as >>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a >>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba >>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 >>>>> >>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT >>>>> hu >>>>> mbnailHandler.java 06f21d3 >>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986 >>>>> >>>>> Diff: https://reviews.apache.org/r/19917/diff/ >>>>> >>>>> >>>>> Testing >>>>> ------- >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Ding Yuan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Daan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Daan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Daan >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Daan >>>> >>> >> >> >> >> -- >> Daan >> > -- Daan