HI Daan, I am not sure exactly how to monkey-test cloudstack, what I did was to do $ mvn test which shows “BUILD SUCCESS”. I also did this: $ mvn clean install -P systemvm,developer which also succeeded.
Is that what you mean? If not, please let me know what to do and I will further test it. I will also work on assigning the proper reviewers now. Thanks, Ding On Apr 3, 2014, at 12:04 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > thanks Ding, > > I saw your update. Did your run a cloud with this code; i.e. did you > monkey-test it? > > On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <y...@ece.utoronto.ca> wrote: >> Oops, sorry I didn't publish my diff. I just published it on review board. >> Thanks for the comment Daan! Please let me know if I further need to improve >> it. >> >> Ding >> >> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >> >>> Ding, >>> >>> I think we can dare to do so in master as it will not see release for >>> a while. We'll just have to be aware of the locations and be on alert >>> for any stacktraces that will pass this list. I would not like to do >>> this on the 4.4 branch even when it is an improvement of code quality >>> as such. It might do things or prevent things from happening that we >>> need done. >>> >>> I don't see a new version of the diff in the review request. Did you >>> 'Update' -> 'Upload Diff'? >>> >>> regards, >>> Daan >>> >>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <y...@ece.utoronto.ca> wrote: >>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and >>>> addressed >>>> Daan's comment on providing more distinctive text messages. >>>> >>>> Sorry that I haven't split them into smaller patches. >>>> >>>> Note in a few cases the original code was like: >>>> try { >>>> pstmt = txn.prepareAutoCloseStatement(sql); >>>> String gmtCutTime = >>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); >>>> pstmt.setString(1, gmtCutTime); >>>> pstmt.setString(2, gmtCutTime); >>>> >>>> ResultSet rs = pstmt.executeQuery(); >>>> while (rs.next()) { >>>> RunningHostCountInfo info = new RunningHostCountInfo(); >>>> info.setDcId(rs.getLong(1)); >>>> info.setHostType(rs.getString(2)); >>>> info.setCount(rs.getInt(3)); >>>> >>>> l.add(info); >>>> } >>>> } catch (SQLException e) { >>>> } catch (Throwable e) { >>>> } >>>> >>>> The try block only throws SQLException as checked exception, and this code >>>> would also swallow any unchecked exceptions. I removed the catch >>>> (Throwable) >>>> in these cases to avoid potentially swallowing any unexpected runtime >>>> exceptions. Please let me know if this is not desirable so I can further >>>> update. >>>> >>>> Thanks, >>>> Ding >>>> >>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <y...@eecg.toronto.edu> wrote: >>>> >>>> Thanks all for the quick comments! >>>> If i understand the discussion correctly, I will just change all the added >>>> log printing statements to debug verbosity. I will upload a new patch for >>>> that shortly. >>>> >>>> Now a bit back story: the reason we are doing this is that we recently did >>>> an analysis on many bugs collected from JIRA to understand why today's >>>> cloud >>>> system fails. And we found that almost all of the cluster-wide failures are >>>> caused by buggy exception handling, which would often turn a component >>>> failure into a cluster-wide one. One of the common bug pattern is ignoring >>>> some exceptions -- allowing them to propagate and finally turn into >>>> disaster. >>>> Therefore we built a simple static checking tool just to check some simple >>>> rules for exception handling, such as if an exception is ignored. >>>> Admittedly, it would be much harder to reason about the potential >>>> consequences caused for ignoring a certain exception, that's why without >>>> much more domain knowledge I can only recommend to 1) avoid over-catching >>>> an >>>> exception, especially when the handling logic will swallow it, and 2) log >>>> them, as what this patch does. >>>> >>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are >>>> particularly suspicious. It might be worthwhile to double check their >>>> correctness if you have time. I am reposting them below. >>>> >>>> Thanks! >>>> Ding >>>> >>>> ========================= >>>> Case 1: >>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" >>>> >>>> 326: try { >>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw); >>>> .. .. .. >>>> 373: } catch (Exception e) { >>>> 374: // I really thing we should do a better handling of these exceptions >>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); >>>> 376: } >>>> >>>> Comment seems to suggest for a better handling. >>>> ========================= >>>> ========================= >>>> Case 2: >>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" >>>> >>>> 2284: try { >>>> 2285: /* >>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't >>>> 2287: * return if propagation return non null >>>> 2288: */ >>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(), >>>> ResourceState.Event.UpdatePassword); >>>> 2290: if (result != null) { >>>> 2291: return result; >>>> 2292: } >>>> 2293: >>>> 2294: doUpdateHostPassword(h.getId()); >>>> 2295: } catch (AgentUnavailableException e) { >>>> 2296: } >>>> >>>> Seem from the comment the logic should be fixed. >>>> A similar code snipeet is at: >>>> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" >>>> ========================= >>>> >>>> ========================= >>>> Case 3: >>>> >>>> Line: 184, File: >>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java" >>>> >>>> 174: try >>>> 175: { >>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this); >>>> 177: if (success) { >>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId()); >>>> 179: AutoScaleVmGroupResponse responseObject = >>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); >>>> 180: setResponseObject(responseObject); >>>> 181: responseObject.setResponseName(getCommandName()); >>>> 182: } >>>> 183: } catch (Exception ex) { >>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween >>>> 185: s_logger.warn("Failed to create autoscale vm group", ex); >>>> 186: } >>>> >>>> The comment, "TODO", seems to suggest for better handling. >>>> ========================= >>>> ========================= >>>> Case 4: >>>> >>>> Line: 222, File: "com/cloud/api/ApiDispatcher.java" >>>> This snippet is moved to ParamProcessWorker.java in the trunk. >>>> >>>> 203: try { >>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); >>>> .. .. >>>> 220: } catch (CloudRuntimeException cloudEx) { >>>> 221: s_logger.error("CloudRuntimeException", cloudEx); >>>> 222: // FIXME: Better error message? This only happens if the API command >>>> is >>>> not executable, which typically >>>> 223: //means >>>> 224: // there was >>>> 225: // and IllegalAccessException setting one of the parameters. >>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal >>>> error executing API command " + cmd.getCommandName().substring(0, >>>> cmd.getCommandName().length() - 8)); >>>> 227: } >>>> >>>> The "FIXME" comment seems to suggest for getter error message. >>>> ========================= >>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogl...@gmail.com> wrote: >>>> >>>> Ding Yuan, >>>> >>>> Any objections to that? >>>> >>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk >>>> <alena.prokharc...@citrix.com> wrote: >>>> >>>> >>>> >>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>> >>>> I think we agree indeed. Doesn't mean we should start this discuss >>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding >>>> Yuan wants to do a preliminary version of it? >>>> >>>> >>>> Wiki guide would be useful indeed. >>>> >>>> >>>> >>>> In the meantime I don't think that it hurts for the present patch to >>>> do everything in debug and decide about higher levels needed later. >>>> >>>> >>>> Agree. >>>> >>>> >>>> >>>> regards, >>>> >>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk >>>> <alena.prokharc...@citrix.com> wrote: >>>> >>>> Daan, >>>> >>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall >>>> under >>>> "to go with it or to indicate passing a certain code path². I¹ve just >>>> noticed that some of them were added with DEBUG, and some with WARN >>>> level, >>>> and wanted to correct that. >>>> >>>> So we should: >>>> >>>> 1) For sure: never print them out in WARN as there is nothing admin >>>> should >>>> do in this case, because the code just handles them by ignoring. >>>> 2) Figure out what would be the correct level to log them with: INFO or >>>> DEBUG >>>> >>>> From ³Logging best practices² articles, I can see that people use INFO >>>> as >>>> a ³storyline² of normal application behavior, and DEBUG for sort of >>>> information that helps you to track down the failure cases scenarios. >>>> To >>>> me, stuff added by Ding, falls under second category. But I might be >>>> wrong >>>> as I don¹t recall on the spot any discussions happening on the debug >>>> topic, from the mailing list. >>>> >>>> -Alena. >>>> >>>> >>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >>>> >>>> Alena, >>>> >>>> What I read in your comment is a description of INFO vs WARN. Debug >>>> would be only for outputting stacktraces to go with it or to indicate >>>> passing a certain code path. >>>> >>>> Agree? >>>> >>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk >>>> <alena.prokharc...@citrix.com> wrote: >>>> >>>> >>>> ----------------------------------------------------------- >>>> This is an automatically generated e-mail. To reply, visit: >>>> https://reviews.apache.org/r/19917/#review39324 >>>> ----------------------------------------------------------- >>>> >>>> >>>> Is there a reason why logs for some exceptions are being logged in >>>> DEBUG mode, and some in WARN? From my point of view, if the code only >>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of >>>> Admins are seeking for WARN statements in the log, and they might be >>>> confused seeing WARN w/o further failure or retry. >>>> >>>> - Alena Prokharchyk >>>> >>>> >>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote: >>>> >>>> >>>> ----------------------------------------------------------- >>>> This is an automatically generated e-mail. To reply, visit: >>>> https://reviews.apache.org/r/19917/ >>>> ----------------------------------------------------------- >>>> >>>> (Updated April 2, 2014, 1:55 p.m.) >>>> >>>> >>>> Review request for cloudstack. >>>> >>>> >>>> Repository: cloudstack-git >>>> >>>> >>>> Description >>>> ------- >>>> >>>> This is the patch for JIRA-6242. See >>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more >>>> details. >>>> Thanks! >>>> >>>> >>>> Diffs >>>> ----- >>>> >>>> >>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java >>>> 0d41bc1 >>>> >>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager >>>> Im >>>> pl.java 01508a4 >>>> >>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>> 3e088db >>>> >>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit >>>> y/ >>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e >>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f >>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d >>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java >>>> e42eaf4 >>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java >>>> 34fdca5 >>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java >>>> 58dd916 >>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 >>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java >>>> 1f382d6 >>>> >>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM >>>> an >>>> agerImpl.java 6ed1274 >>>> >>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla >>>> ss >>>> Registry.java 83c8a42 >>>> >>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer >>>> ve >>>> rDiscoverer.java 0ad6dc4 >>>> >>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>> rC >>>> onnectionPool.java b779085 >>>> >>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe >>>> rS >>>> torageProcessor.java e512046 >>>> >>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat >>>> as >>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a >>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba >>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 >>>> >>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT >>>> hu >>>> mbnailHandler.java 06f21d3 >>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986 >>>> >>>> Diff: https://reviews.apache.org/r/19917/diff/ >>>> >>>> >>>> Testing >>>> ------- >>>> >>>> >>>> Thanks, >>>> >>>> Ding Yuan >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Daan >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Daan >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Daan >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Daan >>> >> > > > > -- > Daan >