[ 
https://issues.apache.org/jira/browse/HIVE-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146438#comment-13146438
 ] 

jirapos...@reviews.apache.org commented on HIVE-2472:
-----------------------------------------------------



bq.  On 2011-11-07 22:24:59, Ning Zhang wrote:
bq.  > trunk/ql/src/test/results/clientpositive/ctas.q.out, line 25
bq.  > <https://reviews.apache.org/r/2583/diff/4/?file=56200#file56200line25>
bq.  >
bq.  >     Here I think the plan should be stage-3 (StatsTask) depends on 
stage-4 (DDLTask), which depends on stage-0 (MoveTask).
bq.  >     
bq.  >     Also can you change the .q file to add describe formatted 
<created_table> to verify that the stats are gathered for the newly created 
table after CTAS?

Change to amend this was done in Semantic Analyzer around line 7050


- Robert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2583/#review3089
-----------------------------------------------------------


On 2011-11-08 04:08:52, Robert Surówka wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2583/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-08 04:08:52)
bq.  
bq.  
bq.  Review request for Ning Zhang and Kevin Wilfong.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Explanation of how stats for CTAS were added (line numbers may be slightly 
off due to repository changes):
bq.  
bq.  
bq.  Because CTAS contains an INSERT, the approach was to reuse as much, from 
what is already there for INSERT, as possible.
bq.  
bq.  There were 2 main issues: to make sure that FileSinkOperators will gather 
stats, and that there will be StatsTask that will then aggregate them and store 
to Metastore.
bq.  
bq.  FileSinkOperator gathers stats if conf.isGatherStats (line 576) is true. 
It is set to true upon adding StatsTask in GenMRFileSink1 (126) which will 
happen if isInsertTable will be true, which is set in 105 (I didn't change 
comment since it is still being set due to INSERT OVERWRITE that is just a part 
of the CTAS). To make it true, one must set that CTAS contains insert into the 
table, add the TableSpec, which was done in SemanticAnalyzer (1051) 
(BaseSemanticAnalyzer tableSpec() must had been changed to support 
TOK_CREATETABLE). 
bq.  
bq.  Next issue, was to supply to StatsWork (part of StatsTask) information 
about the table being created. To do that, database name was added to 
CreateTableDesc, and it is set in SemanticAnalyzer (7878). Then this 
CreateTableDesc is added to LoadFileDesc (just to get table info) in 
SemanticAnalyzer(4000), which then is added to StatsWork in GenMRFileFileSink1 
(170). This StatskWork is later used by StatsTask to get the table info.
bq.  
bq.  Another thing was that StatsTask would be called before the 
CreateTableTask. To remedy that, a change in SemanticAnalyzer(7048) was made, 
so for CTAS the StatsTask will be moved to be after the crtTblTask.
bq.  
bq.  Finally in StatsTask, support for the LoadFileDesc was added (which is 
present for CTAS). Importantly, line 306 was changed, since for CTAS there was 
an empty partitionList, instead of null (this last change took me around 3 
hours to find, since this was last place I looked at, when figuring what's 
wrong).
bq.  
bq.  
bq.  I noticed that to database.q.out "Cannot get table db1.db1.conflict_name" 
in line 1224 was added, but it wasn't present there in previous diff version 
that contained exactly same Java code, so I assume it is due to some other work 
happening concurrently.
bq.  
bq.  
bq.  This addresses bug HIVE-2472.
bq.      https://issues.apache.org/jira/browse/HIVE-2472
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1199067 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 
1199067 
bq.    
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 
1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 
1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 1199067 
bq.    trunk/ql/src/test/queries/clientpositive/ctas.q 1199067 
bq.    trunk/ql/src/test/results/clientpositive/ctas.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/database.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/merge3.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1199067 
bq.  
bq.  Diff: https://reviews.apache.org/r/2583/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  run ant tests with overwrite option, changes to out files are part of the 
diff
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Robert
bq.  
bq.


                
> Metastore statistics are not being updated for CTAS queries.
> ------------------------------------------------------------
>
>                 Key: HIVE-2472
>                 URL: https://issues.apache.org/jira/browse/HIVE-2472
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Robert Surówka
>         Attachments: HIVE-2472.1.patch.txt, HIVE-2472.2.patch, 
> HIVE-2472.3.patch, HIVE-2472.4.patch
>
>
> We need to add a Statistics task at the end of a CTAS query in order to 
> update the metastore statistics for the table being created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to