Hi folks,
In some recent work, I've been looking at some issues related to the Hive
metastore schema. It seems more dev related, hence I'm not sure if the user
list is ideal, but will ask anyway. Still, it would help me if someone can
clarify a few questions as below:
Firstly, I would like to kno
Hi
Since you are on a pseudo distributed/ single node environment the hadoop
mapreduce parallelism is limited.
You might be having just a few map slots and map tasks might be in queue
waiting for others to complete. In a larger cluster your job should be faster.
As a side note, Certain SQL que
Hi
Since you are on a pseudo distributed/ single node environment the hadoop
mapreduce parallelism is limited.
You might be having just a few map slots and map tasks might be in queue
waiting for others to complete. In a larger cluster your job should be faster.
Certain SQL queries that uliliz
Hi!
I'm planning to use Hive to query custom Avro logging records. I
transfer data via Flume to HDFS and pick it up from there
The Flume event schema is
{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}
which mean
Well it's probably worth to know 30G is really hitting rock bottom when you
talk about big data. Hadoop is linearly scalable so probably going to 3 or 4
similar machines could get you below the mysql time but it's hardly a fair
comparison.
Setting it up I would suggest reading the hadoop docs:
Thnx for your reply , i am new to hadoop and hive .My goal is to process a big
data using hadoop,this is my university project ( Data Mining ) ,need to show
that hadoop is better than mysql in case of Big data(30-100GB+) Processing,i
know hadoop does that.To do so,can you please suggest me,how
Generally a single hadoop machine will perform worse then a single mysql
machine. People normally use hadoop when they have so much data it won't really
fit on a single machine and it would require specialized hardware (Stuff like
SAN's) to run.
30GB of data really isn't that much and 2GB of ram
i use sqoop to import 30GB data ( two table employee(aprox 21 GB) and
salary(aprox 9GB ) into hadoop(Single Node) via hive.
i run a sample query like SELECT
EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM EMPLOYEE JOIN SALARY
WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUN
Duplicate CD_ID in CDS table was linked to an old table which was created
before hive 0.7.0 to 0.9.0 migration, this might be causing this issue. I
am still debugging this issue , looks like there was some problem in
upgrade.
On Sun, Mar 10, 2013 at 8:47 PM, Dean Wampler <
dean.wamp...@thinkbigan
On 7 Mar 2013, at 2:43, Murtaza Doctor wrote:
Folks,
Wanted to get some help or feedback from the community on this one:
Hello,
in that case it is advisable to start a new thread, and not 'reply-to'
when you compose your email :-)
Have a nice day
David
10 matches
Mail list logo