CDH 5.7 Hive crash when performing a join on a parquet external table

2016-05-04 Thread Nicholas Hakobian
Has anyone else encountered issues when using a partitioned Parquet external tables in Hive on CDH 5.7 (Hive is running in map reduce mode) ? When I perform a simple query such as (I've removed any names/fields that I am not allowed to publicly share): select * from user_event left join names on n

Re: Hive parquet on EMR

2016-04-01 Thread Nicholas Hakobian
Make sure your column names in the struct exactly matches the case in the table create statement. We just decided to make everything lowercase, but occasionally someone forgets and makes one of the characters upper case and Hive fails. There was a fix for this in Hive, but it only fixed querying w

Re: a newline in column data ruin Hive

2016-02-23 Thread Nicholas Hakobian
We just had this problem recently with our data. There are actually 2 things you have to worry about. The reader (which the suggestion above seems to solve) and the intermediate stages (if using MR). We didn't have the issue with the reader since we use Parquet and Avro to store our data, but we ha

Re: NPE from simple nested ANSI Join

2016-02-04 Thread Nicholas Hakobian
R JOIN customer c ON n.n_nationkey = c.c_nationkey) INNER > JOIN orders o ON c.c_custkey = o.o_custkey; > > > > > On Thu, Feb 4, 2016 at 3:45 PM, Nicholas Hakobian > wrote: >> >> I don't believe Hive supports that join format. Its expecting either a >> table name or

Re: NPE from simple nested ANSI Join

2016-02-04 Thread Nicholas Hakobian
I don't believe Hive supports that join format. Its expecting either a table name or a subquery. If its a subquery, it usually requires it to have a table name alias so it can be referenced in an outer statement. -Nick Nicholas Szandor Hakobian Data Scientist Rally Health nicholas.hakob...@rallyh

Re: "Create external table" nulling data from source table

2016-01-28 Thread Nicholas Hakobian
Do you have any fields with embedded newline characters? If so, certain hive output formats will parse the newline character as the end of row, and when importing, chances are the missing fields (now part of the next row) will be padded with nulls. This happens in Hive as well if you are using a Te

Re: trying to figure out number of MR jobs from explain output

2015-12-11 Thread Nicholas Hakobian
You can't find out definitively because it is going to depend on the nature of the data being processed, especially when it comes to mapjoins. If the output of one stage is small enough for it to mapjoin, parts of a stage can be skipped as the whole dataset is on every node. I'm sure there are oth

Re: Help With Hive Windowing

2015-12-10 Thread Nicholas Hakobian
Last_value and lag are not aggregate functions. They are best thought of as streaming windowing functions. The last_value function picks the last value over your window and applies it to every row in your select statement. Similarly, lag over your window will lag that column by the specified number

Re: Re: Hiveserver2 client stdout

2015-10-19 Thread Nicholas Hakobian
If you want to retrieve the STDOUT logs from the HiveServer2 Thrift server, you can see how beeline does it here: https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Commands.java#L958-L963 I used this as an example of how to pull the query logs for a recent projec

Re: Hive 1.2.1 installation troubleshooting - No known driver to handle "jdbc://hive2://:10000"

2015-10-08 Thread Nicholas Hakobian
The format of the jdbc connection string should be something like: beeline -u jdbc:hive2://localhost:1 Since you're connecting to localhost you can also try the embedded connection mode by starting beeline like: beeline -u jdbc:hive2:// The connection string format is documented here: https