Migrating to hive 8.1 on EMR

2012-06-18 Thread Ranjan Bagchi
Hi, I've built a datastore using Hive 7.1 backed by S3 using persistent metadata. Now that hive 8.1 is available, I'd like to migrate to the new version. However, I'm having trouble reading tables with the persistent schema. Looking in the logs, I'm getting stack traces like the following: 20

Lifecycle and Configuration of a hive UDF

2012-04-18 Thread Ranjan Bagchi
Hi, What's the lifecycle of a hive udf. If I call select MyUDF(field1,field2) from table; Then MyUDF is instantiated once per mapper, and within each mapper execute(field1, field2) is called for each reducer? I hope this is the case, but I can't find anything about this in the documentation

Table design question: Arrays and Structs

2012-02-15 Thread Ranjan Bagchi
Hi, I'm capturing data of the form A (1:n) B, which is a fairly standard item-subitem pattern. In a standard DB, I'd have A and B tables with a foreign key from B to A. But since Hive is different -- there's no natural primary key in my data and joins seem much more expensive -- I'm consideri

Debugging partitions

2012-01-21 Thread Ranjan Bagchi
Hi, This is probably a newbie question, but is there any way to get hive to log which files it goes through as it performs a query? I'm setting up a partitioned store (EMR, persistent metadata, storage all on S3) which *appears* to work, but I'd like understand its behavior better, especially

Re: Help with a table located on s3n

2011-12-16 Thread Ranjan Bagchi
> e: mgro...@oanda.com > > "Best Trading Platform" - World Finance's Forex Awards 2009. > "The One to Watch" - Treasury Today's Adam Smith Awards 2009. > > > - Original Message - > From: "Ranjan Bagchi" > To: user@hive.ap

Re: Help with a table located on s3n

2011-12-16 Thread Ranjan Bagchi
es: columns _col0 columns.types bigint serialization.format 1 TotalFiles: 1 GatherStats: false MultiFileSpray: false Stage: Stage-0 Fetch Operator limit: -1 Time taken: 0.156 seconds On Dec 15, 2011, a

Help with a table located on s3n

2011-12-15 Thread Ranjan Bagchi
Hi, I'm experiencing the following: I've a file on s3 -- s3n://my.bucket/hive/ranjan_test. It's got fields (separated by \001) and records (separated by \n). I want it to be accessible on hive, the ddl is: CREATE EXTERNAL TABLE IF NOT EXISTS ranjan_test ( ip_address string, num_counted int )