[ 
https://issues.apache.org/jira/browse/HIVE-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146069#comment-13146069
 ] 

jirapos...@reviews.apache.org commented on HIVE-2472:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2583/
-----------------------------------------------------------

(Updated 2011-11-08 04:08:52.506156)


Review request for Ning Zhang and Kevin Wilfong.


Changes
-------

I implemented all the change requests (hopefully), with one exception that I 
already commented about.


Summary
-------

Explanation of how stats for CTAS were added (line numbers may be slightly off 
due to repository changes):


Because CTAS contains an INSERT, the approach was to reuse as much, from what 
is already there for INSERT, as possible.

There were 2 main issues: to make sure that FileSinkOperators will gather 
stats, and that there will be StatsTask that will then aggregate them and store 
to Metastore.

FileSinkOperator gathers stats if conf.isGatherStats (line 576) is true. It is 
set to true upon adding StatsTask in GenMRFileSink1 (126) which will happen if 
isInsertTable will be true, which is set in 105 (I didn't change comment since 
it is still being set due to INSERT OVERWRITE that is just a part of the CTAS). 
To make it true, one must set that CTAS contains insert into the table, add the 
TableSpec, which was done in SemanticAnalyzer (1051) (BaseSemanticAnalyzer 
tableSpec() must had been changed to support TOK_CREATETABLE). 

Next issue, was to supply to StatsWork (part of StatsTask) information about 
the table being created. To do that, database name was added to 
CreateTableDesc, and it is set in SemanticAnalyzer (7878). Then this 
CreateTableDesc is added to LoadFileDesc (just to get table info) in 
SemanticAnalyzer(4000), which then is added to StatsWork in GenMRFileFileSink1 
(170). This StatskWork is later used by StatsTask to get the table info.

Another thing was that StatsTask would be called before the CreateTableTask. To 
remedy that, a change in SemanticAnalyzer(7048) was made, so for CTAS the 
StatsTask will be moved to be after the crtTblTask.

Finally in StatsTask, support for the LoadFileDesc was added (which is present 
for CTAS). Importantly, line 306 was changed, since for CTAS there was an empty 
partitionList, instead of null (this last change took me around 3 hours to 
find, since this was last place I looked at, when figuring what's wrong).


I noticed that to database.q.out "Cannot get table db1.db1.conflict_name" in 
line 1224 was added, but it wasn't present there in previous diff version that 
contained exactly same Java code, so I assume it is due to some other work 
happening concurrently.


This addresses bug HIVE-2472.
    https://issues.apache.org/jira/browse/HIVE-2472


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 
1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1199067 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 1199067 
  trunk/ql/src/test/queries/clientpositive/ctas.q 1199067 
  trunk/ql/src/test/results/clientpositive/ctas.q.out 1199067 
  trunk/ql/src/test/results/clientpositive/database.q.out 1199067 
  trunk/ql/src/test/results/clientpositive/merge3.q.out 1199067 
  trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out 1199067 
  trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1199067 

Diff: https://reviews.apache.org/r/2583/diff


Testing
-------

run ant tests with overwrite option, changes to out files are part of the diff


Thanks,

Robert


                
> Metastore statistics are not being updated for CTAS queries.
> ------------------------------------------------------------
>
>                 Key: HIVE-2472
>                 URL: https://issues.apache.org/jira/browse/HIVE-2472
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Robert Surówka
>         Attachments: HIVE-2472.1.patch.txt, HIVE-2472.2.patch, 
> HIVE-2472.3.patch, HIVE-2472.4.patch
>
>
> We need to add a Statistics task at the end of a CTAS query in order to 
> update the metastore statistics for the table being created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to