‘Never’ is a strong word.
1. We’re talking about the metadata so – a. The data format is irrelevant b. The records number is small (scale of thousands) I would have sacrificed 1 second of metadata processing for a better user experience 2. Partitions values are being held in the metastore (at least with MySQL) as strings Dudu From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: Sunday, June 05, 2016 11:38 AM To: user@hive.apache.org Subject: Re: Convert date in string format to timestamp in table definition Never use string when you can use int - the performance will be much better - especially for tables in Orc / parquet format On 04 Jun 2016, at 22:31, Igor Kravzov <igork.ine...@gmail.com<mailto:igork.ine...@gmail.com>> wrote: Thanks Dudu. So if I need actual date I will use view. Regarding partition column: I can create 2 external tables based on the same data with integer or string column partition and see which one is more convenient for our use. On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dmarkov...@paypal.com<mailto:dmarkov...@paypal.com>> wrote: I’m not aware of an option to do what you request in the external table definition but you might want to that using a view. P.s. I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. – select * from threads_test where yyyymmdd like ‘2016%’ – year 2016; select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016; select * from threads_test where yyyymmdd like ‘______01’ -- first of every month; $ hdfs dfs -ls -R /tmp/threads_test drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45 /tmp/threads_test/20160604 -rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt {"url":"www.blablabla.com<http://www.blablabla.com>","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"} ---------------------------------------------------------------------------------------------------- hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; hive> create external table threads_test ( url string ,pagetype string ,adddate string ,postdate string ,posttext string ) partitioned by (yyyymmdd string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' location '/tmp/threads_test' ; hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604'; hive> select * from threads_test; www.blablabla.com<http://www.blablabla.com> pg1 2016-05-17T02:10:44.527 2016-05-16T02:08:55 YadaYada 20160604 hive> create view threads_test_v as select url ,pagetype ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp) as adddate ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp) as postdate ,posttext from threads_test ; hive> select * from threads_test_v; www.blablabla.com<http://www.blablabla.com> pg1 2016-05-17 02:10:44.527 2016-05-16 02:08:55 YadaYada From: Igor Kravzov [mailto:igork.ine...@gmail.com<mailto:igork.ine...@gmail.com>] Sent: Saturday, June 04, 2016 8:13 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Convert date in string format to timestamp in table definition Hi, I have 2 dates in Json file defined like this "addDate": "2016-05-17T02:10:44.527", "postDate": "2016-05-16T02:08:55", Right now I define external table based on this file like this: CREATE external TABLE threads_test (url string, pagetype string, adddate string, postdate string, posttext string) partitioned by (yyyymmdd int) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location 'my location'; is it possible to define these 2 dates as timestamp? Do I need to change date format in the file? is it possible to specify date format in table definition? Or I better off with string? Thanks in advance.