[ https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542841#comment-16542841 ]
Zoltan Haindrich commented on HIVE-19943: ----------------------------------------- I'm not sure how this supposed to be fixed; exploring to add these as inputformat args is a dead end because the actual reader is some kind of "linereader" from hadoop... I feel that this "HiveRecordReader" should somehow be pushed under the llaprecordreader somehow...but that seems like a hard thing to do (and probably not the right move)... [~sershe] do you have any suggestion? To reproduce, patching an "existing test" which by mistake only tested the local mode...so it missed this issue all along... (and run it with TestMiniLlapCliDriver) {code} diff --git ql/src/test/queries/clientpositive/file_with_header_footer.q ql/src/test/queries/clientpositive/file_with_header_footer.q index 8913e54ad0..5dddcaba2a 100644 --- ql/src/test/queries/clientpositive/file_with_header_footer.q +++ ql/src/test/queries/clientpositive/file_with_header_footer.q @@ -11,6 +11,10 @@ CREATE EXTERNAL TABLE header_footer_table_1 (name string, message string, id int SELECT * FROM header_footer_table_1; +explain +SELECT count(distinct name) FROM header_footer_table_1; +SELECT assert_true(count(distinct name)=11) FROM header_footer_table_1; + SELECT * FROM header_footer_table_1 WHERE id < 50; CREATE EXTERNAL TABLE header_footer_table_2 (name string, message string, id int) PARTITIONED BY (year int, month int, day int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2"); {code} > Header values keep showing up in result sets > -------------------------------------------- > > Key: HIVE-19943 > URL: https://issues.apache.org/jira/browse/HIVE-19943 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 2.1.0 > Environment: Hdinsight Hive interactivequerry > [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions] > Reporter: Liam De Lee > Priority: Major > > We are using the tblproperties ("skip.header.line.count"="1") when creating > an external table. > When we do a select * from table we get it back as expected without the > header present in the result set. > However when we do for instance a count(1) we get the header back in this > count (tested with a select * from table and paste it in notepad to find the > amount of rows) > If we also do this with a select distinct(column) from table we also get the > header as a distinct value. > file structure: > ||_TESTING_TYPE|| > |adf| > |hyg| > |abc| > > *Update: 26/06/2018* > Create statement: > {code:java} > ----------------------------------- > --test_type-- > ----------------------------------- > CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in > ( > test_type string > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\073' > STORED AS TEXTFILE > LOCATION 'adl://{adlslocation}data/data2/test' > tblproperties ("skip.header.line.count"="1") > {code} > Select statement: > {code:java} > select * from test_type_in; > {code} > Distinct statement: > {code:java} > select distinct test_type from test_type_in ORDER BY test_type; > {code} > I cannot show the exact statement because of NDA so i changed those values to > test. > > I can also tell you it is not just at our HDInsight but also at another > company we are working for. It does not Mather what is in the data as well. > so for testing purposes: > {code:java} > test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)