Re: Loading data containing newlines

2016-01-13 Thread Alexander Pivovarov
Time to use Spark and Spark-Sql in addition to Hive? It's probably going to happen sooner or later anyway. I sent you Spark solution yesterday. (you just need to write unbzip2AndCsvToListOfArrays(file: String): List[Array[String]] function using BZip2CompressorInputStream and Super CSV API) you

RE: Loading data containing newlines

2016-01-13 Thread Gerber, Bryan W
1. hdfs dfs -copyFromLocal /incoming/files/*.bz2 hdfs://host.name/data/stg/table/ 2. CREATE EXTERNAL TABLE stg_ (cols...) ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION '/data/stg/table/' 3. CREATE TABLE (cols...) STORE AS ORC tbl

Re: Loading data containing newlines

2016-01-13 Thread Gopal Vijayaraghavan
> We are pushing the compressed text files into HDFS directory for Hive >EXTERNAL table, then using an INSERT on the table using ORC storage. We >are letting Hive handle the ORC file creation process. Are the compressed text files small enough to process one by one? I did write something similar

RE: Loading data containing newlines

2016-01-13 Thread Mich Talebzadeh
Thanks Brian. Just to clarify do you use something like below? 1. hdfs dfs -copyFromLocal /var/tmp/t.bcp hdfs://rhes564.hedat.net:9000/misc/t.bcp 2. CREATE EXTERNAL TABLE name (col1 INT, col2 string, .) COMMENT 'load from bcp file'ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS

RE: Loading data containing newlines

2016-01-13 Thread Gerber, Bryan W
We are pushing the compressed text files into HDFS directory for Hive EXTERNAL table, then using an INSERT on the table using ORC storage. We are letting Hive handle the ORC file creation process. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Tuesday, January 12, 2016 4:41 PM To: user

Re: Passing parameters to Beeline

2016-01-13 Thread matshyeq
Try *--hivevar **name=value* works for hive and should work for beeline too: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline–NewCommandLineShell

TR: Group by and FROM_UNIXTIME function

2016-01-13 Thread PICQUENOT Samuel (i-BP - CGI)
Hello, Firstly, the FROM_UNIXTIME function's date pattern is case sensitive : * FROM_UNIXTIME(1451308548, '-MM') --> 2015-12 * FROM_UNIXTIME(1451308548, '-MM') --> 2016-12 (because is not a valid date pattern and 2016 is the current year) Consider the following que

Re: Fastest way to get the row count

2016-01-13 Thread Devopam Mittra
hello Mahender, I use beeline CLI mostly for such operations. My best bet is to parse the output for "rows selected"...line and use it for logging the row count. INFO - lines are an overhead but I can happily live with that to achieve my objective. Suggestions welcome, in case there is a cleaner w

Passing parameters to Beeline

2016-01-13 Thread Trainee Bingo
Hi All, Hope all are enjoying working in hive !!! I am having one question regarding hive and beeline: I am passing parameters to hive script using "-d". Eg: *hive -d table_name -f emp.hql* * emp.hql:* CREATE EXTERNAL TABLE ${table_name} ( ID STRI

Re: Writing hive column headers in 'Insert overwrite query'

2016-01-13 Thread Elliot West
I created an issue in the Hive Jira related to this. You may wish to vote on it or watch it if you believe it to be relevant. https://issues.apache.org/jira/browse/HIVE-12860 On 13 January 2016 at 09:43, Elliot West wrote: > Unfortunately there appears to be no nice way of doing this. I've se

Re: Writing hive column headers in 'Insert overwrite query'

2016-01-13 Thread Elliot West
Unfortunately there appears to be no nice way of doing this. I've seen others achieve a work around by UNIONing with a table of the same schema, containing a single row of the header names, and then finally sorting by a synthesised rank column (see: http://stackoverflow.com/a/25214480/74772). I be