Hi Mich, ddl as below.
Hi Prasanth, Hive version as reported by Hortonworks is 1.2.1.2.3. Thanks, Marcin CREATE TABLE `<tablename>`( `col1` string, `col2` bigint, `col3` string, `col4` string, `col4` string, `col5` bigint, `col6` string, `col7` string, `col8` string, `col9` string, `col10` boolean, `col11` boolean, `col12` string, `metadata` struct<file:string,hostname:string,level:string,line:bigint,logger:string,method:string,millis:bigint,pid:bigint,timestamp:string>, `col14` string, `col15` bigint, `col16` double, `col17` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://reporting-handy/<path>' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='true', 'numFiles'='2800', 'numRows'='297263', 'rawDataSize'='454748401', 'totalSize'='31310353', 'transient_lastDdlTime'='1457437204') Time taken: 1.062 seconds, Fetched: 34 row(s) On Tue, Mar 8, 2016 at 4:29 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi > > can you please provide DDL for this table "show create table <TABLE>" > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 7 March 2016 at 23:25, Marcin Tustin <mtus...@handybook.com> wrote: > >> Hi All, >> >> Following on from from our parquet vs orc discussion, today I observed >> hive's alter table ... concatenate command remove rows from an ORC >> formatted table. >> >> 1. Has anyone else observed this (fuller description below)? And >> 2. How to do parquet users handle the file fragmentation issue? >> >> Description of the problem: >> >> Today I ran a query to count rows by date. Relevant days below: >> 2016-02-28 16866 >> 2016-03-06 219 >> 2016-03-07 2863 >> I then ran concatenation on that table. Rerunning the same query resulted >> in: >> >> 2016-02-28 16866 >> 2016-03-06 219 >> 2016-03-07 1158 >> >> Note reduced count for 2016-03-07 >> >> I then ran concatenation a second time, and the query a third time: >> 2016-02-28 16344 >> 2016-03-06 219 >> 2016-03-07 1158 >> >> Now the count for 2016-02-28 is reduced. >> >> This doesn't look like an elimination of duplicates occurring by design - >> these didn't all happen on the first run of concatenation. It looks like >> concatenation just kind of loses data. >> >> >> >> Want to work at Handy? Check out our culture deck and open roles >> <http://www.handy.com/careers> >> Latest news <http://www.handy.com/press> at Handy >> Handy just raised $50m >> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> >> led >> by Fidelity >> >> > -- Want to work at Handy? Check out our culture deck and open roles <http://www.handy.com/careers> Latest news <http://www.handy.com/press> at Handy Handy just raised $50m <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led by Fidelity