kangkaisen opened a new issue #546: Colocate Join table balance bug URL: https://github.com/apache/incubator-doris/issues/546 **Describe the bug** Yestoday, I expanded 5 BEs to one prod cluster. when colocate table balance, there was a error ``` 2019-01-16 12:44:31,901 INFO 341 [ColocateTableBalancer.checkGroupTablets():221] AddedBackendIds [63256529, 63256540, 63256553] fo r colocate group 55245478 2019-01-16 12:44:31,901 INFO 341 [ColocateTableBalancer.handleBackendAdded():491] handleBackendAdded start 2019-01-16 12:44:31,901 INFO 341 [ColocateTableBalancer.handleBackendAdded():505] for colocate group 55245478, needMoveBucketSeqs : 5 , bucketSeqPerNewBackend: 1 2019-01-16 12:44:31,901 ERROR 341 [ColocateTableBalancer.handleBackendAdded():556] Index: 3, Size: 3 java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.ArrayList.rangeCheck(ArrayList.java:653) ~[?:1.8.0_112] at java.util.ArrayList.get(ArrayList.java:429) ~[?:1.8.0_112] at org.apache.doris.clone.ColocateTableBalancer.handleBackendAdded(ColocateTableBalancer.java:530) [palo-fe.jar:?] at org.apache.doris.clone.ColocateTableBalancer.checkGroupTablets(ColocateTableBalancer.java:222) [palo-fe.jar:?] at org.apache.doris.clone.ColocateTableBalancer.runOneCycle(ColocateTableBalancer.java:80) [palo-fe.jar:?] at org.apache.doris.common.util.Daemon.run(Daemon.java:96) [palo-fe.jar:?] ``` This bug is obvious. After I fixed this bug, After several hours, I found all colocate groups was still balancing. I looked the log, **found the colocate meta has been wrong**! ``` 2019-01-16 16:12:34,496 INFO 2682 [ColocateTableBalancer.checkBalancingGroups():89] colocate group: 55245478 backendsPerBucketSeq is [[6913774, 21833567, 63256487, 63256517, 63256529], [15310, 6913774, 63256487, 63256517, 63256540], [21833568, 23694, 63256487, 63256517, 63256553], [23693, 3820567, 63256487, 63256517, 63256529], [18711, 3820568, 63256540], [21833566, 10477683, 21833567], [18710, 18711, 23695], [23693, 10477683, 18710], [23695, 23694, 10469551], [3820567, 21820, 21833565], [23694, 10477683, 21833566] , [10002, 21833567, 3820567], [10469551, 3820568, 21833566], [23694, 21833564, 21820], [10002, 10477683, 21833567], [18709, 10002, 23694], [10002, 23695, 21820], [18709, 23694, 15310], [10477683, 10469551, 3820568], [18711, 10469551, 18710], [21833564, 2183356 7, 18711], [3820567, 10469551, 6913774], [15310, 21833566, 23693], [15310, 21833567, 21833564], [23695, 15310, 18711], [6913774, 2 1833567, 23695], [21833567, 23694, 15310], [21833568, 21833565, 23694], [3820567, 21833566, 18711], [23693, 21833567, 21833568], [ 3820567, 23695, 18709], [18711, 21833568, 21833564]] ``` The replicationNum for the colocate group is 3. so the backends for each BucketSeq should be 3. This reason is I added new BE one by one and the interval is long, and ColocateTableBalancer doesn't skip balance when the colocate group has been balancing.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org