Hi Bejoy,
ThanksI see...I was asking because I wanted to know how much total
storage space I would need on the cluster for the given data in the tables.
Are you saying that for 2 tables of 500 Gb each (spread across the
cluster), there would be a need for intermediate storage of 25 GB? Or
Thanks Both of you for their replies,
If I decide to deploy my JAR on Amazon Elastic Mapreduce then,
1) Default block size is 64 MB, so insuch case I have to set it to 128
MB. is it right???
2) Amazon EMR has already values for mapred.min.split.size
and mapred.max.split.size, and mapper and r
Try setting this value to your block
Size, for 128 mb block size,
> set mapred.min.split.size=128000
Sent from my iPhone
On May 7, 2012, at 10:11 PM, Bhavesh Shah wrote:
> Thanks Nitin for your reply.
>
> In short my Task is
> 1) Initially I want to import the data from MS SQL Server into HD
I am no expert on sqoop so i may be wrong but importing 30*0.5M records
(table by table) is a huge operation. I would rather prefer just dump and
import using hive cli (sqoop is good choice too but i dont know the
benchmarks)
if you are doing so many joins then its good to be on hadoop cluster
ins
Thanks Nitin for your reply.
In short my Task is
1) Initially I want to import the data from MS SQL Server into HDFS using
SQOOP.
2) Through Hive I am processing the data and generating the result in one
table
3) That result containing table from Hive is again exported to MS SQL
SERVER back.
Actu
1) check the jobtracker url to see how many maps/reducers have been launched
2) if you have a large dataset and wants to execute it fast, you
set mapred.min.split.size and mapred.max.split.size to an optimal value so
that more mappers will be launched and will finish
3) if you are doing joins, ther
Hello all,
I have written a Hive JDBC code and created a JAR of it. I am running that
JAR on 10 cluster.
But the problem as I am using the 10 cluster still the performance is same
as that on single cluster.
What to do to improve the performance of Hive Jobs? Is there anything
configuration setting
Hi all,
I wanted to see if anyone has seen this error before:
Query returned non-zero code: 9, cause: FAILED: Execution Error, return code
-101 from org.apache.hadoop.hive.ql.exec.MapRedTask; nested exception is
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED:
Execution Err
Thanks Shashwat.
That did work. However I do find this behavior very weird that it is able
to find all other libs at their proper location on local filesystem but
searches for this particular one on HDFS. I'll try to dig deeper into the
code to see if I can find a cause for this happening.
On Mon
Do one thing create the same structure /Users/testuser/hive-0.9.0/
lib/hive-builtins-0.9.0.jar on the hadoop file system and den try.. will
work
Shashwat Shriparv
On Mon, May 7, 2012 at 11:57 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:
> Thanks for the reply.
>
> Assumi
Thanks for the reply.
Assuming that you mean for permissions within the HIVE_HOME, they all look
ok to me. Is there anywhere else too you want me to check?
On Mon, May 7, 2012 at 11:16 AM, hadoop hive wrote:
> check for the permission..
>
>
> On Mon, May 7, 2012 at 7:30 PM, kulkarni.swar...@gma
check for the permission..
On Mon, May 7, 2012 at 7:30 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:
> I created a very simple hive table and then ran the following query that
> should run a M/R job to return the results.
>
> hive> SELECT COUNT(*) FROM invites;
>
> But I am
Hi Roshan,
The following snippet summarizes the delimiters for your Hive table:
colelction.delim\u0002
field.delim \u0001
mapkey.delim\u0003
serialization.format\u0001
Your fields are
Hi Safdar
Map side join uses memory on the hive client to form hash tables. They
don't come into key value juggling part as there is no reduce phase involved
for such jobs.
Regards
Bejoy KS
From: Ali Safdar Kureishy
To: user@hive.apache.org
Sent: Monday
Hi Ali
The 500*500 Gigs of data is actually processed by multiple tasks across
multiple nodes. In default settings a task will process 64Mb of data per task.
So you don't need 25 GB temp space in a node at all . A few gigs
of free space is more than enough for any MR task .
Regards
Please ignore my question below. I made a mistake with my calculation. The
map-side joins do not perform a cross-product of the data. They just emit
the data using the join-key as the row key.
Thanks,
Safdar
On Mon, May 7, 2012 at 12:31 AM, Ali Safdar Kureishy <
safdar.kurei...@gmail.com> wrote:
Hi,
I'm setting up a Hadoop cluster and would like to understand how much disk
space I should expect to need with joins.
Let's assume that I have 2 tables, each of about 500 GB. Since the tables
are large, these will all be reduce-side joins. As far as I know about such
joins, the data generated
17 matches
Mail list logo