Has anyone else encountered issues when using a partitioned Parquet
external tables in Hive on CDH 5.7 (Hive is running in map reduce mode) ?
When I perform a simple query such as (I've removed any names/fields that I
am not allowed to publicly share):
select * from user_event left join names on n
Make sure your column names in the struct exactly matches the case in the
table create statement. We just decided to make everything lowercase, but
occasionally someone forgets and makes one of the characters upper case and
Hive fails.
There was a fix for this in Hive, but it only fixed querying w
We just had this problem recently with our data. There are actually 2
things you have to worry about. The reader (which the suggestion above
seems to solve) and the intermediate stages (if using MR). We didn't
have the issue with the reader since we use Parquet and Avro to store
our data, but we ha
R JOIN customer c ON n.n_nationkey = c.c_nationkey) INNER
> JOIN orders o ON c.c_custkey = o.o_custkey;
>
>
>
>
> On Thu, Feb 4, 2016 at 3:45 PM, Nicholas Hakobian
> wrote:
>>
>> I don't believe Hive supports that join format. Its expecting either a
>> table name or
I don't believe Hive supports that join format. Its expecting either a
table name or a subquery. If its a subquery, it usually requires it to
have a table name alias so it can be referenced in an outer statement.
-Nick
Nicholas Szandor Hakobian
Data Scientist
Rally Health
nicholas.hakob...@rallyh
Do you have any fields with embedded newline characters? If so,
certain hive output formats will parse the newline character as the
end of row, and when importing, chances are the missing fields (now
part of the next row) will be padded with nulls. This happens in Hive
as well if you are using a Te
You can't find out definitively because it is going to depend on the
nature of the data being processed, especially when it comes to
mapjoins. If the output of one stage is small enough for it to
mapjoin, parts of a stage can be skipped as the whole dataset is on
every node.
I'm sure there are oth
Last_value and lag are not aggregate functions. They are best thought
of as streaming windowing functions. The last_value function picks the
last value over your window and applies it to every row in your select
statement. Similarly, lag over your window will lag that column by the
specified number
If you want to retrieve the STDOUT logs from the HiveServer2 Thrift server,
you can see how beeline does it here:
https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Commands.java#L958-L963
I used this as an example of how to pull the query logs for a recent
projec
The format of the jdbc connection string should be something like:
beeline -u jdbc:hive2://localhost:1
Since you're connecting to localhost you can also try the embedded
connection mode by starting beeline like:
beeline -u jdbc:hive2://
The connection string format is documented here:
https
10 matches
Mail list logo