[ https://issues.apache.org/jira/browse/HIVE-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087771#comment-16087771 ]
Sushanth Sowmyan commented on HIVE-17095: ----------------------------------------- [~daijy]/[~ashutoshc], could I bug you for a review while we wait for ptests? > Long chain repl loads do not complete in a timely fashion > --------------------------------------------------------- > > Key: HIVE-17095 > URL: https://issues.apache.org/jira/browse/HIVE-17095 > Project: Hive > Issue Type: Bug > Components: Query Planning, repl > Reporter: sapin amin > Assignee: Sushanth Sowmyan > Attachments: HIVE-17095.2.patch, HIVE-17095.patch > > > Per performance testing done by [~sapinamin] (thus, I'm setting him as > reporter), we were able to discover an important bug affecting replication. > It has the potential to affect other large DAGs of Tasks that hive generates > as well, if those DAGs have multiple paths to child Task nodes. > Basically, we find that incremental REPL LOAD does not finish in a timely > fashion. The test, in this case was to add 400 partitions, and replicate > them. Associated with each partition, there was an ADD PTN and a ALTER PTN. > For each of the ADD PTN tasks, we'd generate a DDLTask, a CopyTask and a > MoveTask. For each Alter ptn, there'd be a single DDLTask. And order of > execution is important, so it would chain in dependency collection tasks > between phases. > Trying to root cause this shows us that it seems to stall forever at the > Driver instantiation time, and it almost looks like the thread doesn't > proceed past that point. > Looking at logs, it seems that the way this is written, it looks for all > tasks generated that are subtrees of all nodes, without looking for > duplicates, and this is done simply to get the number of execution tasks! > And thus, the task visitor will visit every subtree of every node, which is > fine if you have graphs that look like open trees, but is horrible for us, > since we have dependency collection tasks between each phase. Effectively, > this is what's happening: > We have a DAG, say, like this: > 4 tasks in parallel -> DEP col -> 4 tasks in parallel -> DEP col -> ... > This means that for each of the 4 root tasks, we will do a full traversal of > every graph (not just every node) past the DEP col, and this happens > recursively, and this leads to an exponential growth of number of tasks > visited as the length and breadth of the graph increase. In our case, we had > about 800 tasks in the graph, with roughly a width of about 2-3, with 200 > stages, a dep collection before and after, and this meant that leaf nodes of > this DAG would have something like 2^200 - 3^200 ways in which they can be > visited, and thus, we'd visit them in all those ways. And all this simply to > count the number of tasks to schedule - we would revisit this function > multiple more times, once per each hook, once for the MapReduceCompiler and > once for the TaskCompiler. > We have not been sending such large DAGs to the Driver, thus it has not yet > been a problem, and there are upcoming changes to reduce the number of tasks > replication generates(as part of a memory addressing issue), but we still > should fix the way we do Task traversal so that a large DAG cannot cripple us. -- This message was sent by Atlassian JIRA (v6.4.14#64029)