Re: Google ProtocolBufferException exception when using ORC file format

2013-10-16 Thread Nitin Pawar
Hi Zhang, you can not load a text file as orc file as load command does not transform your txt file to orc. To write a orc file you will need to use hcatalog apis. What you can do is create a temp table and load the data there. then do a insert into table test select * from temptest On Thu, Oc

Google ProtocolBufferException exception when using ORC file format

2013-10-16 Thread Zhang Xiaoyu
Hi, all, I am simply using ORC file to store the data and get below exception. Any idea what's wrong with it? create table test (f1 int) stored as orc tblproperties ("orc.compress"="NONE"); load data local inpath '/home/athena/test.txt' into table test; select * from test; ===> Error: java.io.I

Re: No result display

2013-10-16 Thread kun yan
I am just test Now I can display query result ,where year =2013 When I do not specify the where clause condition, it is not displaying data. I must say exactly specify the partition in the where clause condition in order to have data show 2013/10/17 kun yan > Thank you for your reply > My DDL a

Re: No result display

2013-10-16 Thread kun yan
Thank you for your reply My DDL as follow first step CREATE EXTERNAL TABLE IF NOT EXISTS data( ROWKEY STRING, STATION INT, MONTH INT, DAY INT, HOUR INT, MINUTE INT, ) PARTITIONED BY (YEAR INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; seconde step

FULL OUTER JOIN Two Small Tables More Efficiently in Hive?

2013-10-16 Thread Ji Zhang
Hi All, I have two tables. One has 2,000,000 rows (150M in 6 files), and the other has 5,000 rows (400K in 1 file). The join is (approximately) a full outer join, since the city_id field has only 100 distinct values: CREATE TABLE prop_total AS SELECT * FROM prop_1 a JOIN prop_2 b ON a.city_id = b

Re: No result display

2013-10-16 Thread Nitin Pawar
1) there is no data loaded in table 2) there is no data matching to your where clause condition 3) there is mismatch in where condition what you can do is run select * from data limit 10; this will show you if you got any data in the table. Other way to do the same is hdfs -ls on table directory.

No result display

2013-10-16 Thread kun yan
Hi all By hive query to display, only field that does not show any results may be what causes it? the sql like follow select * from data where year=2013; hive version is 0.10 -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their ow

about hive store query data

2013-10-16 Thread kun yan
hi all hive version 0.10 This is a fundamental problem when I console from the Hive query, the result is output to the console. Then it is written to HDFS and then displayed on the console, or directly to the console it? thanks a lot -- In the Hadoop world, I am just a novice, explore the entir

Re:Re: histogram_numeric find the most frequent value

2013-10-16 Thread Richard
good idea, I will try. thanks At 2013-10-16 19:12:30,"Ed Soniat" wrote: You could use a modular math to transform the data in to single value representations of each range you intend to represent with your boundary using a sub select. On Wed, Oct 16, 2013 at 7:09 AM, Richard wrote: I wa

ORC Files: Does this get me anything?

2013-10-16 Thread John Omernik
So I am experimenting with ORC files, and I have a fast little table that has login events. Out of curiosity, I was wondering if based on what we all knew about ORC files, if did the below, would the per file indexing get me anything? Now, before people complain about small files, let's toss that

Where clause position

2013-10-16 Thread Xiu Guo
The following query does not work: SELECT T1.ACCOUNT_NUM ,T1.ACCOUNT_MODIFIER_NUM ,T1.DEPOSIT_TYPE_CD ,T1.DEPOSIT_TERM ,CASE WHEN T1.DEPOSIT_TYPE_CD='5021' THEN '9255' ELSE CASE WHEN T4.LEDGER_SUBJECT_ID_01= '' THEN '' ELSE COALESCE(T4.LEDGER_SUBJECT_ID_01,'') END END V_LE

Re: Where to get hive serde jars.

2013-10-16 Thread Sonal Goyal
You can get the ser de from https://github.com/cloudera/cdh-twitter-example. I am not sure if there is a prebuilt version in any Cloudera repo, but you can check with the Cloudera team. Sent from my iPad On Oct 16, 2013, at 7:12 PM, Panshul Whisper wrote: > Hello, > > I am trying to imple

Hive Query Questions - is null in WHERE

2013-10-16 Thread Raj Hadoop
All,   When a query is executed like the below   select field1 from table1 where field1 is null;   I am getting the results which have empty values or nulls in field1. How does is null work in Hive queries.   Thanks, Raj

Where to get hive serde jars.

2013-10-16 Thread Panshul Whisper
Hello, I am trying to implement Serde in hive for reading Json files directly into my hive tables. I am using cloudera hue for querying to the hive server. Where can I get the cloudera hive serde jars from? or am I missing something else? when I create a table with the following statement: crea

Re: histogram_numeric find the most frequent value

2013-10-16 Thread Ed Soniat
You could use a modular math to transform the data in to single value representations of each range you intend to represent with your boundary using a sub select. On Wed, Oct 16, 2013 at 7:09 AM, Richard wrote: > I want to find the most frequent value of a column, I noticed > histogram_numerc,

histogram_numeric find the most frequent value

2013-10-16 Thread Richard
I want to find the most frequent value of a column, I noticed histogram_numerc, but I cannot specify the bin boundary. The result is not what I want. take an example as follows, I want something like select gid, most_frequent(category) from mytable group by gid. where category is a column w

RE: Create table as select

2013-10-16 Thread Adline Dsilva
Really sorry guys, it seems error occurred due to something else. Everything is getting created now :-) hive>create TABLE tmp_table1 AS select * from user_table limit 500; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the

RE: Create table as select

2013-10-16 Thread Adline Dsilva
Limit clause is creating the problem, hive> create TABLE tmp_table AS select * from user_table limit 500; FAILED: SemanticException 0:0 Error creating temporary folder on: hdfs://master:8020/user/hive/warehouse. Error encountered near token 'TOK_TMP_FILE' I'm trying out hadoop with a BI tool,