Re: Problem querying deeply nested data with Parquet and ORC File Hive SerDes

2014-04-04 Thread mpeterson2
I figured out the problem. The JSON SerDe I wrote is not case sensitive, but the ORC and Parquet SerDes are case sensitive. So this works: select ClientCode, Encounter.Number from parquet_tbl; but this does not: select clientcode, encounter.Number from parquet_tbl; -Michael On Thu, Apr 3, 2014

Problem querying deeply nested data with Parquet and ORC File Hive SerDes

2014-04-03 Thread mpeterson2
Hi, I'm new to using Parquet and ORC files and I'm hitting a problem with querying nested data. Can those files formats be used to query deeply nested data? If yes, why I am getting an error with the SerDes for both of them? Here's the background: I'm starting from a JSON data file like this:

When is the serialize method of a Hive SerDe invoked?

2013-12-09 Thread mpeterson2
When is the serialize method of a Hive SerDe invoked? I recently created a couple of Hive SerDes and wrote unit tests for the serialize and deserialize methods, and I've been able to test the deserialize method in a real Hive environment, but I can't figure out a scenario where serialize is called

Re: How to specify Hive auxiliary jar in HDFS, not local file system

2013-12-03 Thread mpeterson2
Thanks. I just got an Oozie Hive action set up to test on a single node cluster and putting "ADD JAR /path/to/hdfs/location" in the hive script worked. Hopefully I won't hit any issues when I try it on a multi-node cluster. On Mon, Dec 2, 2013 at 5:37 PM, Adam Kawa wrote: > You can use ADD JAR

How to specify Hive auxiliary jar in HDFS, not local file system

2013-12-02 Thread mpeterson2
Is it possible to specify a Hive auxiliary jar (like a SerDe) that is in HDFS rather than the local fileystem? I am using a CsvSerDe I wrote and when I specify it Hive hive.aux.jars.path with a local file system path it works: hive -hiveconf hive.aux.jars.path=*file:*///path/to/truven-hive-serdes

Re: Trouble with large table joins when using a CSV SerDE

2013-12-02 Thread mpeterson2
I found a solution: set hive.auto.convert.join.noconditionaltask.size=2500; The default is 100MB. If I drop it to 25MB it now works. I ran this with my full 11-table join on production data and it finished successfully and in a reasonable amount of time. -Michael On Mon, Dec 2, 2013 at

Trouble with large table joins when using a CSV SerDE

2013-12-02 Thread mpeterson2
Hi, I recently wrote a CSV SerDe for Hive. It works in most scenarios, but I've found one situation where Hive is failing with it. I also tried the queries with the opensource CSV SerDe: https://github.com/ogrodnek and I see the same issue. So either we both wrote our SerDe incorrectly or there