Re: odd behavior of ntile

2016-09-26 Thread Rex X
Is this a bug of hive? On Sun, Sep 25, 2016 at 11:29 PM, Rex X wrote: > Hi All, > > I run following hive > > create table2 as > select > id, > ntile(6) over (partition by city order by price) as price_tile, > ntile(3) over (partition by city order by discount)

odd behavior of ntile

2016-09-25 Thread Rex X
Hi All, I run following hive create table2 as select id, ntile(6) over (partition by city order by price) as price_tile, ntile(3) over (partition by city order by discount) as discount_tile, ntile(6) over (partition by city order by number) as number_tile from table1; Table1 contains 8 million

How to do such a join of Pairing in Hive?

2016-08-25 Thread Rex X
1. Firstly we create a hive table by loading following csv file > $cat data.csv > > ID,City,Zip,Flag > 1,A,95126,0 > 2,A,95126,1 > 3,A,95126,1 > 4,B,95124,0 > 5,B,95124,1 > 6,C,95124,0 > 7,C,95127,1 > 8,C,95127,0 > 9,C,95127,1 (a) where "ID" above is a

What's the advised way to do groupby 2 attributes from a table with 1000 columns?

2016-03-27 Thread Rex X
Give a table with 1000 columns: col1, col2, ..., col1000 The source table is about 1PB. I only need to query 3 columns, select col1, col2, sum(col3) as col3 from myTable group by col1, col2 Will it be advised to do a subquery first, and then send it to the aggregation of group by, so that

Re: How to work around non-executive /tmp with Hive in Parquet+Snappy compression?

2016-03-24 Thread Rex X
/mod_mbox/cassandra-user/201312.mbox/%3c52c1dec4.2080...@cj.com%3E I need to set following option for Hive: JVM_OPTS="$JVM_OPTS -Dorg.xerial.snappy.tempdir=/path/that/allows/executables" Any tips how? Regards, Rex On Thu, Mar 24, 2016 at 9:10 PM, Rex X wrote: > Nice! Problem solved! &

Re: How to work around non-executive /tmp with Hive in Parquet+Snappy compression?

2016-03-24 Thread Rex X
t should be the property : > hive.exec.local.scratchdir > > BR > > Tale > > On Sat, Mar 19, 2016 at 8:46 PM, Rex X wrote: > >> The local /tmp is non-executive configured by admin. >> >> When we do a "select ...limit 10" query on Hive, it copied some file

How to do multiple output of Hive with Python?

2016-03-24 Thread Rex X
Given a query select category, value from someHiveTable; I expect to output the result above of each category into one separate file named by the corresponding category. Any tips how to make it?

How to append one column to an existing array column in Hive?

2016-03-19 Thread Rex X
For example, to append columnA to an existing array-type column B select string_column_A, array_column_B, *append(array_column_B, string_column_A) as AB* from onetable; I did not find any append function as above in Hive. To be more accurate, I should say "set" instead of "ar

How to work around non-executive /tmp with Hive in Parquet+Snappy compression?

2016-03-19 Thread Rex X
The local /tmp is non-executive configured by admin. When we do a "select ...limit 10" query on Hive, it copied some file to /tmp, and tried to execute it. But since the /tmp is non-executive, I always bumped out of the Hive shell with some binding error. What is the setting to change this /tmp

Re: how to create an array from two columns?

2016-03-13 Thread Rex X
gt; myHashSet.addAll(myArr); > > if (myHashSet != null) { > return new ArrayList<>(myHashSet); > } else { > return null; > } > > } > > @Override > public String getDisplayString(String[] input) {

Re: how to create an array from two columns?

2016-03-12 Thread Rex X
For the first question, is there any way to use "set" instead of an "array" to dedupe all elements? "select array(1,1)" will return "[1,1]", not "[1]". On Sat, Mar 12, 2016 at 5:26 PM, Rex X wrote: > Thank you, Chandeep. Yes, my first pro

Re: how to create an array from two columns?

2016-03-12 Thread Rex X
quot;temp3"] > > > On Mar 13, 2016, at 12:33 AM, Rex X wrote: > > How to make the following work? > > 1. combine columns A and B to make one array as a new column AB. Both > column A and B are string types. > > select > string_columnA, > string_columnB, &g

Re: How to rename a hive table without changing location?

2016-03-12 Thread Rex X
.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 13 March 2016 at 00:01, Rex X wrote: > >> Based on

how to create an array from two columns?

2016-03-12 Thread Rex X
How to make the following work? 1. combine columns A and B to make one array as a new column AB. Both column A and B are string types. select string_columnA, string_columnB, *array(string_columnA, string_columnB) *as AB from Table1; 2. append columnA to an existing array-type column B select

How to rename a hive table without changing location?

2016-03-12 Thread Rex X
Based on the Hive doc below: Rename Table *ALTER TABLE table_name RENAME TO new_table_name;* This statement lets you change the name of a table to a different name. *As of version 0.6, a rename on a managed table moves its HDFS location as well. (Older Hive versions just renamed the table in t