[ https://issues.apache.org/jira/browse/HIVE-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146438#comment-13146438 ]
jirapos...@reviews.apache.org commented on HIVE-2472: ----------------------------------------------------- bq. On 2011-11-07 22:24:59, Ning Zhang wrote: bq. > trunk/ql/src/test/results/clientpositive/ctas.q.out, line 25 bq. > <https://reviews.apache.org/r/2583/diff/4/?file=56200#file56200line25> bq. > bq. > Here I think the plan should be stage-3 (StatsTask) depends on stage-4 (DDLTask), which depends on stage-0 (MoveTask). bq. > bq. > Also can you change the .q file to add describe formatted <created_table> to verify that the stats are gathered for the newly created table after CTAS? Change to amend this was done in Semantic Analyzer around line 7050 - Robert ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2583/#review3089 ----------------------------------------------------------- On 2011-11-08 04:08:52, Robert Surówka wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2583/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-11-08 04:08:52) bq. bq. bq. Review request for Ning Zhang and Kevin Wilfong. bq. bq. bq. Summary bq. ------- bq. bq. Explanation of how stats for CTAS were added (line numbers may be slightly off due to repository changes): bq. bq. bq. Because CTAS contains an INSERT, the approach was to reuse as much, from what is already there for INSERT, as possible. bq. bq. There were 2 main issues: to make sure that FileSinkOperators will gather stats, and that there will be StatsTask that will then aggregate them and store to Metastore. bq. bq. FileSinkOperator gathers stats if conf.isGatherStats (line 576) is true. It is set to true upon adding StatsTask in GenMRFileSink1 (126) which will happen if isInsertTable will be true, which is set in 105 (I didn't change comment since it is still being set due to INSERT OVERWRITE that is just a part of the CTAS). To make it true, one must set that CTAS contains insert into the table, add the TableSpec, which was done in SemanticAnalyzer (1051) (BaseSemanticAnalyzer tableSpec() must had been changed to support TOK_CREATETABLE). bq. bq. Next issue, was to supply to StatsWork (part of StatsTask) information about the table being created. To do that, database name was added to CreateTableDesc, and it is set in SemanticAnalyzer (7878). Then this CreateTableDesc is added to LoadFileDesc (just to get table info) in SemanticAnalyzer(4000), which then is added to StatsWork in GenMRFileFileSink1 (170). This StatskWork is later used by StatsTask to get the table info. bq. bq. Another thing was that StatsTask would be called before the CreateTableTask. To remedy that, a change in SemanticAnalyzer(7048) was made, so for CTAS the StatsTask will be moved to be after the crtTblTask. bq. bq. Finally in StatsTask, support for the LoadFileDesc was added (which is present for CTAS). Importantly, line 306 was changed, since for CTAS there was an empty partitionList, instead of null (this last change took me around 3 hours to find, since this was last place I looked at, when figuring what's wrong). bq. bq. bq. I noticed that to database.q.out "Cannot get table db1.db1.conflict_name" in line 1224 was added, but it wasn't present there in previous diff version that contained exactly same Java code, so I assume it is due to some other work happening concurrently. bq. bq. bq. This addresses bug HIVE-2472. bq. https://issues.apache.org/jira/browse/HIVE-2472 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1199067 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 1199067 bq. trunk/ql/src/test/queries/clientpositive/ctas.q 1199067 bq. trunk/ql/src/test/results/clientpositive/ctas.q.out 1199067 bq. trunk/ql/src/test/results/clientpositive/database.q.out 1199067 bq. trunk/ql/src/test/results/clientpositive/merge3.q.out 1199067 bq. trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out 1199067 bq. trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1199067 bq. bq. Diff: https://reviews.apache.org/r/2583/diff bq. bq. bq. Testing bq. ------- bq. bq. run ant tests with overwrite option, changes to out files are part of the diff bq. bq. bq. Thanks, bq. bq. Robert bq. bq. > Metastore statistics are not being updated for CTAS queries. > ------------------------------------------------------------ > > Key: HIVE-2472 > URL: https://issues.apache.org/jira/browse/HIVE-2472 > Project: Hive > Issue Type: Bug > Reporter: Kevin Wilfong > Assignee: Robert Surówka > Attachments: HIVE-2472.1.patch.txt, HIVE-2472.2.patch, > HIVE-2472.3.patch, HIVE-2472.4.patch > > > We need to add a Statistics task at the end of a CTAS query in order to > update the metastore statistics for the table being created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira