‘Never’ is a strong word.

1.       We’re talking about the metadata so –



a.       The data format is irrelevant



b.      The records number is small (scale of thousands)

I would have sacrificed 1 second of metadata processing for a better user 
experience



2.       Partitions values are being held in the metastore (at least with 
MySQL)  as strings

Dudu

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Sunday, June 05, 2016 11:38 AM
To: user@hive.apache.org
Subject: Re: Convert date in string format to timestamp in table definition

Never use string when you can use int - the performance will be much better - 
especially for tables in Orc / parquet format

On 04 Jun 2016, at 22:31, Igor Kravzov 
<igork.ine...@gmail.com<mailto:igork.ine...@gmail.com>> wrote:
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column:  I can create 2 external tables based on the same 
data with integer or string column partition and see which one is more 
convenient for our use.

On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu 
<dmarkov...@paypal.com<mailto:dmarkov...@paypal.com>> wrote:
I’m not aware of an option to do what you request in the external table 
definition but you might want to that using a view.

P.s.
I seems to me that defining the partition column as a string would be more user 
friendly than integer, e.g. –

select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every 
month;





$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 
/tmp/threads_test/20160604
-rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 
/tmp/threads_test/20160604/data.txt

$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
{"url":"www.blablabla.com<http://www.blablabla.com>","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}

----------------------------------------------------------------------------------------------------


hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

hive>
create external table threads_test
(
    url         string
   ,pagetype    string
   ,adddate     string
   ,postdate    string
   ,posttext    string
)
    partitioned by (yyyymmdd string)
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    location '/tmp/threads_test'
;

hive> alter table threads_test add partition (yyyymmdd=20160604) location 
'/tmp/threads_test/20160604';

hive> select * from threads_test;

www.blablabla.com<http://www.blablabla.com>    pg1    2016-05-17T02:10:44.527   
 2016-05-16T02:08:55  YadaYada      20160604

hive>
create view threads_test_v
as
select      url
           ,pagetype
           ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as 
timestamp)  as adddate
           ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as 
timestamp)  as postdate
           ,posttext

from        threads_test
;

hive> select * from threads_test_v;

www.blablabla.com<http://www.blablabla.com>    pg1    2016-05-17 02:10:44.527   
 2016-05-16 02:08:55  YadaYada


From: Igor Kravzov 
[mailto:igork.ine...@gmail.com<mailto:igork.ine...@gmail.com>]
Sent: Saturday, June 04, 2016 8:13 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Convert date in string format to timestamp in table definition

Hi,

I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
  "postDate": "2016-05-16T02:08:55",

Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
 pagetype string,
 adddate string,
 postdate string,
 posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';

is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date 
format in table definition?
Or I better off with string?

Thanks in advance.

Reply via email to