RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Chalcy Raja
I did figure out how to compress data from an uncomressed data in hive table. I also created a table as sequence file format. Is there a way to know if a hive table (hdfs file underneath) is in sequence file format? Describe extended table does not give the file format. Thanks, Chalcy

Map-Joining the same small table multiple times

2012-06-18 Thread Mark Schramm
All, We have a small table that we use the map-join technique to join to several large tables in separate hive query scripts. As I understand it, the map-join will do some preparatory work to get the small table into the distributed cache for the map-join. These steps are (from my understandi

Re: Migrating to hive 8.1 on EMR

2012-06-18 Thread Carl Steinbach
Hi Ranjan, Looks like the NPE is getting generated here: if (0 == getCols().size()) { throw new HiveException( "at least one column must be specified for the table"); } Which would seem to indicate that the table record which was fetched doesn't have any columns. Did you

Migrating to hive 8.1 on EMR

2012-06-18 Thread Ranjan Bagchi
Hi, I've built a datastore using Hive 7.1 backed by S3 using persistent metadata. Now that hive 8.1 is available, I'd like to migrate to the new version. However, I'm having trouble reading tables with the persistent schema. Looking in the logs, I'm getting stack traces like the following: 20

RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Chalcy Raja
Snappy with sequence file works well for us. We'll have to decide which one suits our needs. Is there a way to convert exiting hdfs in text format to convert to sequence files? Thanks for all your input, Chalcy -Original Message- From: Chalcy Raja [mailto:chalcy.r...@careerbuilder

RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Chalcy Raja
It is there. I have io.compression.codecs in core-site.xml. There is not error or warn in the sqoop to hive import which indicates anything. The only reason we want to go to lzo is because snappy is not splittable. Thanks, Chalcy -Original Message- From: Bejoy KS [mailto:bejoy...@

Hadoop Summit Hive Meetup presentations now available online

2012-06-18 Thread Carl Steinbach
Hi, Slides from the talks which were presented at the Hadoop Summit Hive meetup are now available: https://cwiki.apache.org/confluence/display/Hive/Presentations I'd like to thank all of the speakers for making this such a great event, as well as the Hadoop Summit organizers for giving us a plac

Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Bejoy KS
Hi Chalcy Lzo indexing not working, Is Lzo codec class available in 'io.compression.codec' property in core-site.xml? Snappy is not splittable on its own. But sequence files are splittable so when used together snappy gains the advantage of splittability. Regards Bejoy KS Sent from handheld,

RE: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Chalcy Raja
Hi Bejoy, The weird thing is I did not get any errors. The sqoop import will not go to the second phase where it creates lzo index. We did deploy the native libraries, except hadoop-lzo lib which we copied after we built in another machine. We did the same thing on the test machine also. I

Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Bejoy KS
Hi Chalcy Did you notice any warnings related to lzo codec on your mapreduce task logs or on sqoop logs? It could be because LZO libs are not available on the TaskTracker nodes. These are native libs and are tied to OS, so if you have done an OS upgrade then you need to rebuild and deploy the

Re: sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Edward Capriolo
Have you considered switching to sequence files using snappy compression (or lzo). IIRC the process of generating LZO files and then generating an index on top of these is cumbersome. When sequence files are directly splittable. On Mon, Jun 18, 2012 at 9:16 AM, Chalcy Raja wrote: > I am posting i

sqoop, hive and lzo and cdh3u3 - not creating in index automatically

2012-06-18 Thread Chalcy Raja
I am posting it here first and then may be on sqoop user group as well. I am trying to use lzo compression. Tested on a standalone by installing cdh3u3 and did sqoop to hive import with lzo compression and everything works great. The data is sqooped into hdfs and lzo index file got created and

Re: Quering RDBMS table in a Hive query

2012-06-18 Thread Ruslan Al-Fakikh
Bejoy, Again, I do understand those two steps, and I do understand that I have a lot of options of making them run in sequence, but from the very beginning my point was to avoid having two steps. I want to have a dataset in the hive warehouse that I could query at any time with just a hive query w

RE: Hive-0.8.1 PHP Thrift client broken?

2012-06-18 Thread Ruben de Vries
Going to bump this one since I hope to be able to contribute some (worth a bump :P) -Original Message- From: Ruben de Vries [mailto:ruben.devr...@hyves.nl] Sent: Friday, June 15, 2012 11:59 AM To: user@hive.apache.org Subject: Hive-0.8.1 PHP Thrift client broken? Hey Guys, I've been sl

Re: Map side join

2012-06-18 Thread Aniket Mokashi
Hive also have something called uniquejoin. May be you are looking for that. I cannot find documentation for your reference but you can do a jira search. It allows you to perform joining multiple sources with same key, mapside. (all sources should have the same key) ~Aniket On Wed, Jun 13, 2012 a

Re: TThreadPoolServer$Args - NoSuchMethodError

2012-06-18 Thread Aniket Mokashi
Can you share the stack trace? On Tue, Jun 12, 2012 at 2:18 AM, Marcin Cylke wrote: > Hi, > > I'm having problems running current releases of Apache Hive, I get an > error: > > java.lang.NoSuchMethodError: > > org.apache.thrift.server.TThreadPoolServer.(Lorg/apache/thrift/server/TThreadPoolServe

Re: Re:Custom UDF in Python?

2012-06-18 Thread Aniket Mokashi
This would need changes to hive. On Wed, Jun 13, 2012 at 8:34 AM, wrote: > Not sure… but may be through Jython… > > > > *From:* 王锋 [mailto:wfeng1...@163.com] > *Sent:* miércoles, 06 de junio de 2012 7:36 > *To:* user@hive.apache.org > *Subject:* Re:Custom UDF in Python? > > > > udfs need extend

Re: An array and a map in the same Hive table: Can Separator for Map KV pairs be different than Separator for Array elements?

2012-06-18 Thread Aniket Mokashi
Hi Mark, Collection items terminated by applies to both maps and arrays. In your case, you can play with hive's nested complex data structures (so that you can introduce another separator) to deserialize your data but that would require some experimentation (digging into code). This would be non-t