Hive metastore schema question related to ColumnDescriptors

2013-03-12 Thread Hemanth Yamijala
Hi folks, In some recent work, I've been looking at some issues related to the Hive metastore schema. It seems more dev related, hence I'm not sure if the user list is ideal, but will ask anyway. Still, it would help me if someone can clarify a few questions as below: Firstly, I would like to kno

Re: Getting Slow Query Performance!

2013-03-12 Thread bejoy_ks
Hi Since you are on a pseudo distributed/ single node environment the hadoop mapreduce parallelism is limited. You might be having just a few map slots and map tasks might be in queue waiting for others to complete. In a larger cluster your job should be faster. As a side note, Certain SQL que

Re: Getting Slow Query Performance!

2013-03-12 Thread bejoy_ks
Hi Since you are on a pseudo distributed/ single node environment the hadoop mapreduce parallelism is limited. You might be having just a few map slots and map tasks might be in queue waiting for others to complete. In a larger cluster your job should be faster. Certain SQL queries that uliliz

Querying nested Avro data stored in Flume events

2013-03-12 Thread Manuel Simoni
Hi! I'm planning to use Hive to query custom Avro logging records. I transfer data via Flume to HDFS and pick it up from there The Flume event schema is {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]} which mean

RE: Getting Slow Query Performance!

2013-03-12 Thread Bennie Schut
Well it's probably worth to know 30G is really hitting rock bottom when you talk about big data. Hadoop is linearly scalable so probably going to 3 or 4 similar machines could get you below the mysql time but it's hardly a fair comparison. Setting it up I would suggest reading the hadoop docs:

RE: Getting Slow Query Performance!

2013-03-12 Thread Gobinda Paul
Thnx for your reply , i am new to hadoop and hive .My goal is to process a big data using hadoop,this is my university project ( Data Mining ) ,need to show that hadoop is better than mysql in case of Big data(30-100GB+) Processing,i know hadoop does that.To do so,can you please suggest me,how

RE: Getting Slow Query Performance!

2013-03-12 Thread Bennie Schut
Generally a single hadoop machine will perform worse then a single mysql machine. People normally use hadoop when they have so much data it won't really fit on a single machine and it would require specialized hardware (Stuff like SAN's) to run. 30GB of data really isn't that much and 2GB of ram

Getting Slow Query Performance!

2013-03-12 Thread Gobinda Paul
i use sqoop to import 30GB data ( two table employee(aprox 21 GB) and salary(aprox 9GB ) into hadoop(Single Node) via hive. i run a sample query like SELECT EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM EMPLOYEE JOIN SALARY WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUN

Re: Error while table creation

2013-03-12 Thread Abhishek Gayakwad
Duplicate CD_ID in CDS table was linked to an old table which was created before hive 0.7.0 to 0.9.0 migration, this might be causing this issue. I am still debugging this issue , looks like there was some problem in upgrade. On Sun, Mar 10, 2013 at 8:47 PM, Dean Wampler < dean.wamp...@thinkbigan

Re: Avro Backed Hive tables

2013-03-12 Thread David Morel
On 7 Mar 2013, at 2:43, Murtaza Doctor wrote: Folks, Wanted to get some help or feedback from the community on this one: Hello, in that case it is advisable to start a new thread, and not 'reply-to' when you compose your email :-) Have a nice day David