> On Oct. 1, 2015, 1:30 a.m., anilkumar gingade wrote:
> > gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java,
> >  line 1327
> > <https://reviews.apache.org/r/38912/diff/1/?file=1088150#file1088150line1327>
> >
> >     we are passing the same value for both the arguments...Is this expected?

I will change this to (this.forcedDisconnect, preparingForReconnect, false)


- Bruce


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38912/#review101184
-----------------------------------------------------------


On Sept. 30, 2015, 11:59 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38912/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2015, 11:59 p.m.)
> 
> 
> Review request for geode, anilkumar gingade, Jason Huynh, Jianxia Chen, and 
> Lynn Gallinat.
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> Network failure handling was not properly shutting down TCPConduit, leaving 
> threads hanging trying to send messages.  The shutdown code was calling 
> Services.emergencyClose too soon, and the recursion back into 
> GMSMembershipManager shutdown code caused some problems, too.
> 
> GMSHealthMonitor was continually switching between two members to watch even 
> though it had already sent suspect messages about them and had received no 
> response.  I added a collection of IDs that are in this state and modified 
> setNextNeighbor to avoid reusing them.
> 
> GMSHealthMonitor was sending removeMember messages to the locators and a 
> random member, but for some reason this wasn't resolving a network partition 
> fast enough.  I've disabled that behavior for now, sending the messages to 
> all members.  This needs to be revisited because sending the message to all 
> members is not scalable.
> 
> GMSHealthMonitor had some issues with initiating removals when it was in the 
> process of shutting down.  I added some isStopping checks to fix this.
> 
> MembershipJUnitTest and StatRecorderJUnitTest were failing in gradle runs but 
> not under Eclipse because my Eclipse launch configuration wasn't set to 
> enable assertions.  After fixing that I found a number of problems with these 
> tests and fixed them.
> 
> Multicast tests are now implemented in GMSMembershipManager and 
> JGroupsMessenger.  This leverages the ping/pong messaging added for the 
> quorum checker.
> 
> GMSJoinLeave was too slow in sending out new views when there were process 
> failures.  I added code to inform the reply processor if there are queued 
> leave/remove requests so it wouldn't wait for these, and also added similar 
> checks in the removeHealthyMembers method (which performs checks on members 
> using the HealthMonitor).
> 
> When there is a network partition GMSJoinLeave will now send a 
> NetworkPartitionMessage to other members to prod them along in figuring out 
> that they should shut down.
> 
> During a forced-disconnect there can be a lot of warning/fatail log messages. 
>  If there are alert listeners in the system this can create a lot of network 
> traffic and extra work figuring out whether the receiver is even there or 
> not.  GMSMembershipManager now throws away outbound alerts when a 
> forced-disconnect is in process.
> 
> Some of the forced-disconnect shutdown processing has been moved out of the 
> membershp manager's DisconnectThread that was introduced with the quorum 
> checker in order to set the shutdown cause, etc, as quickly as possible.
> 
> I noticed a lot of TXState log messages at debug level with a Throwable stack 
> trace.  There was no comment saying why this was being done so I commented it 
> out.
> 
> JGroups logging level is now set to FATAL by default.  The default log level 
> was a problem during network partitions because each message send was causing 
> a dire warning to be logged.
> 
> I observed a number of threads being left behind when a locator failed to 
> start during auto-reconnect testing.  I added a unit test to LocatorJUnitTest 
> for this and fixed the leaks.
> 
> 
> Diffs
> -----
> 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java
>  c3929c007ea69b15759b5b8480a32e3294cd6d73 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalLocator.java
>  6ea54e2a124410fedb8156a3757b79ea3de52174 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java
>  65fe913b8200e18249334d1e55acf7a67455c247 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/Services.java
>  acd2bedfa9583a37446712d08ef04671f291378a 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
>  f12628aeaa9a5874da8a09db846b4dc653978f99 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/interfaces/Messenger.java
>  b154403ce12ff87576c0f7ca01732b1377f9712b 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java
>  7b6b97df54148985ed6154823eefcf7d3ca82c23 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/NetworkPartitionMessage.java
>  PRE-CREATION 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/SuspectMembersMessage.java
>  117f440325ceab7131c4f5e153f32105a55b7b09 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messenger/JGroupsMessenger.java
>  c1acb87cc184447dbd1879d2c4a569c7a8093dda 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messenger/StatRecorder.java
>  1fef0daec35ab999829f58fc44da03851a852b7f 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
>  64dd1cd5de028b296f0fd6bf33e02ffbf672cf6e 
>   gemfire-core/src/main/java/com/gemstone/gemfire/internal/DSFIDFactory.java 
> a743c8a9f2d227143f04081b11c4a42d9dcb61c2 
>   
> gemfire-core/src/main/java/com/gemstone/gemfire/internal/DataSerializableFixedID.java
>  39fdeef81856d5ff128ed6ea050d4afbc3a612f7 
>   gemfire-core/src/main/java/com/gemstone/gemfire/internal/cache/TXState.java 
> 2672323cc89c8266df943de4dc444984c66ca3af 
>   
> gemfire-core/src/main/resources/com/gemstone/gemfire/internal/logging/log4j/log4j2-default.xml
>  8b1331ffda0ff7a3a1878ac491f9e394821f8ec1 
>   
> gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorDUnitTest.java
>  afb4687d8d75b6f36f2c6900352c4d51b13b28c0 
>   
> gemfire-core/src/test/java/com/gemstone/gemfire/distributed/LocatorJUnitTest.java
>  5a09b5589c63a8ac9e9b4883925ef3627e2066a9 
>   
> gemfire-core/src/test/java/com/gemstone/gemfire/distributed/internal/membership/MembershipJUnitTest.java
>  f7683f9d0c4a1ca1bfd451fd9d0b7fcdc37c10ad 
>   
> gemfire-core/src/test/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeaveJUnitTest.java
>  0af47a7904a85bd5c3efa98f1a398a43486d425f 
>   
> gemfire-core/src/test/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/StatRecorderJUnitTest.java
>  fb502908b7c1bc7a32dfb367d1cdad56997305bb 
>   
> gemfire-core/src/test/java/com/gemstone/gemfire/internal/cache/partitioned/Bug43684DUnitTest.java
>  9722311b4a13f90c94dc63d9eef3091c77d81ad8 
> 
> Diff: https://reviews.apache.org/r/38912/diff/
> 
> 
> Testing
> -------
> 
> precheckin, 3-host network partition testing
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>

Reply via email to