indeed +1 to Gopal on that explanation! That was huge. On Wed, Aug 17, 2016 at 12:58 AM, 明浩 冯 <qiuff...@hotmail.com> wrote:
> Hi Gopal, > > > It works when I disabled the dfs.namenode.acls. > > For the data loss, it doesn't affect me too much currently. But I will > track the issue in Kylin. > > Thank you very much for your detailed explain and solution. You saved me! > > > Best Regards, > > Minghao Feng > ------------------------------ > *From:* Gopal Vijayaraghavan <go...@hortonworks.com> on behalf of Gopal > Vijayaraghavan <gop...@apache.org> > *Sent:* Wednesday, August 17, 2016 1:18:54 PM > *To:* user@hive.apache.org > *Subject:* Re: hive throws ConcurrentModificationException when executing > insert overwrite table > > > > Yes, Kylin generated the query. I'm using Kylin 1.5.3. > > I would report a bug to Kylin about DISTRIBUTE BY RAND(). > > This is what happens when a node which ran a Map task fails and the whole > task is retried. > > Assume that the first attempt of the Map task0 wrote value1 into > reducer-99, because RAND() returned 99. > > Now the task succeeds and then reducer starts, running reducer-0 > successfully, which write 0000_0. > > But before reducer-99 runs, the node which ran Map task0 crashes. > > So, the engine re-runs Map task0 on another node. Except because RAND() is > completely random, it may give 0 as the output of RAND() for "value1". > > The reducer-0 output from Map task0 now has "value1", except there's no > task which will ever read that out or write that out. > > In short, the output of the table will not contain "value1", despite the > input and the shuffle outputs containing "value1". > > I would replace the DISTRIBUTE BY RAND() with SORT BY 0, for a random > distribution without data loss. > > > But I still not sure how can I fix the problem. I'm a beginner of Hive > >and Kylin, Can the problem be fixed by just change the hive or kylin > >settings? > > If you're just experimenting with Kylin right now, I recommend just > disabling the ACL settings in HDFS (this is not permissions btw, ACLs are > permissions++). > > Set dfs.namenode.acls.enabled=false in core-site.xml and wherever else in > your /etc/hadoop/conf it shows up and you should be good to avoid the race > condition. > > Cheers, > Gopal > > >