You can choose to partition by (country, date).
In this case you move the data in a date partition within your country
partition and avoid overwriting old data.
If you choose to go this way one thing to check is that this should not result
in too many partitions.
Large number of partitions have
Does it get stuck before the creating a Hadoop job or after creating a Hadoop
job.
In case it is stuck before creating a hadoop job you can look at Hive.log
(wherever you are directing it) for what is taking a long time to setup the job.
In case the Hadoop job has already started you can look at
You can choose to turn the speculative execution ON which might help you with
few slow progressing tasks.
mapred.map.tasks.speculative.execution and
mapred.reduce.tasks.speculative.execution are the job conf options.
-Original Message-
From: bharath vissapragada [mailto:bharathvissaprag
>However, given the amount of users that visit our website (hundreds of
>thousands of unique users every day), this would lead to a large number of
>partitions (and rather small file sizes, ranging from a >couple of bytes to a
>couple of KB). From the documentation I've read online, it seems th
Hi
You could choose to have the second table (for user ids) partitioned by date
also.
table_root/userid=ab/date=2010-12-31/
That way you can split your data set by both a userid and a date.
You can use dynamic partitions to transform existing date partitioned table
into userid/date partition
Hi
Is there a wiki page which contains a list of keywords in Hive?
Can we use 'time' or 'date' as column names?
Thanks
Vaibhav
You could also choose to look at Amazon ElasticMapReduce.
It allows you to provision an EC2 cluster of your choice preinstalled with Hive
and Hadoop.
https://cwiki.apache.org/confluence/display/Hive/HiveAmazonElasticMapReduce
Thanks
Vaibhav
-Original Message-
From: MIS [mailto:misapa...
You need to point to the exact jar file location and not just the directory
location.
Vaibhav
-Original Message-
From: Sam William [mailto:sa...@stumbleupon.com]
Sent: Monday, August 29, 2011 3:56 PM
To: user@hive.apache.org
Subject: HIVE_AUX_JARS_PATH
I assume you need to set HIVE_AU
CombineFileInputFormat can be used to combine multiple files into one map task.
But CombineFileInputFormat does not attempt to combine compressed files.
It defaults to the HiveFileInputFormat which creates at least one map task per
file.
7G of data is not a lot for 3 node cluster to process and y
If you actually have splittable files you can set the following setting to
create more splits:
mapred.max.split.size appropriately.
Thanks
Vaibhav
From: Daniel,Wu [mailto:hadoop...@163.com]
Sent: Tuesday, August 23, 2011 6:51 AM
To: hive
Subject: Why a sql only use one map task?
I run the fo
Did you restart the hive server after modifying the hive-site.xml settings?
I think you need to restart the server to pick up the latest settings in the
config file.
Thanks
Vaibhav
From: Amit Sharma [mailto:amitsharma1...@gmail.com]
Sent: Monday, August 22, 2011 2:42 PM
To: user@hive.apache.org
This is a really curious case.
How many replicas of each block do you have?
Are you able to copy the data directly using HDFS client?
You could try the hadoop fs -copyToLocal command and see if it can copy the
data from hdfs correctly.
That would help you verify that the issue really is at HDFS
You could also specify fully qualified hdfs path in the create table command.
It could look like
create external table test(key string )
row format delimited
fields terminated by '\000'
collection items terminated by ' '
location 'hdfs://new_master_host:port/table_path';
Then you can use the 'ins
If you want to insert data into a partitioned table without specifying the
partition value, you need to enable dynamic partitioning.
You can use the following switches:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
Thanks
Vaibhav
From: Daniel,Wu [mailto:h
Are you using a custom scheduler?
I have seen issues with jobs having 0 mappers and 1 reducer with Fair scheduler.
From: hadoop n00b [mailto:new2h...@gmail.com]
Sent: Thursday, August 11, 2011 9:32 AM
To: user@hive.apache.org
Subject: Reducer Issue in New Setup
Hello,
We have just setup Hive on
]
Sent: Wednesday, August 10, 2011 3:40 AM
To: user@hive.apache.org
Subject: Re: CDH3 U1 Hive Job-commit very slow
there is only 10186 partitions in the metadata store (select count(1) from
PARTITIONS; in mysql), I think it is not the problem.
2011/8/10 Aggarwal, Vaibhav mailto:vagg...@amazon.com
Do you have a lot of partitions in your table?
Time taken to process the partitions before submitting the job is proportional
to number of partitions.
There is a patch I submitted recently as an attempt to alleviate this problem:
https://issues.apache.org/jira/browse/HIVE-2299
If that is not th
There are many potential benefits of using hive hbase handler.
1. The most obvious is ability to run SQL like queries on your data
instead of using hbase client.
2. Ability to join data with other data sources like HDFS or S3.
3. Ability to move data from your Hive tables int
If you are using CombineHiveInputFormat it might be the case that all files are
being combined into one large split and hence 1 mapper gets created.
If that is the case you can set the max split size in hive-default.xml config
file to create more splits and hence more map tasks:
mapred.max.s
ail, please contact the sender immediately and delete the e-mail from your
computer.
On Wednesday, July 6, 2011 at 7:39 PM, Aggarwal, Vaibhav wrote:
Could you please tell us which Hadoop and Hive version are you using?
Looks like you might be using an older version of Hadoop (more specifically one
Could you please tell us which Hadoop and Hive version are you using?
Looks like you might be using an older version of Hadoop (more specifically one
which ships with old version of jets3t).
Thanks
Vaibhav
From: Wouter de Bie [mailto:wou...@spotify.com]
Sent: Wednesday, July 06, 2011 9:07 AM
To:
21 matches
Mail list logo