Hi,
Along with the mapred.compress* properties try to set
hive.exec.compress.output to true.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: ch huang
Date: Mon, 22 Jul 2013 13:41:01
To:
Reply-To: user@hive.apache.org
Subject: Re: how to let hi
Hi Rahul,
The same shortcuts ctrl+A and ctrl+E works in hive shell for me( hive 0.9)
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: rahul kavale
Date: Tue, 9 Jul 2013 11:00:49
To:
Reply-To: user@hive.apache.org
Subject: Hive CLI
Hey there,
I
Hi Maheedhar
As I understand, you are having a column with data of type MM:SS in your input
data set.
AFAIK this format is not in the standard java.sql.Timestamp format also it
doesn't even have any date part . Hence you may not be able to use Timestamp
data type here.
You can define it as a
Hi
Can you try including the zookeeper quorum and port in your hive configuration
as shown below
hive --auxpath .../hbase-handler.jar, .../hbase.jar, ...zookeeper.jar,
.../guava.jar -hiveconf hbase.zookeeper.quorum= -hiveconf hbase.zookeeper.property.clientPort=
Substitute the above command wi
Hii Jerome
Can you send the error log of the MapReduce task that failed? That should have
some pointers which can help you troubleshoot the issue.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Jérôme Verdier
Date: Mon, 8 Jul 2013 11:25:34
To
Hi Stephen
In addition to join optimization, bucketing helps much in sampling as well. It
helps you to choose the sample space, (ie n buckets of m).
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Stephen Boesch
Date: Sun, 16 Jun 2013 11:20:49
Adding my two cents
If you are having an unpartitioned data/table and would like to partition it on
some specific columns in source table, Use dynamic partition insert.
That would get the source data in separate partitions on a partitioned target
table.
http://kickstarthadoop.blogspot.com/2011/
Hive gets the JobTracker from the mapred-site.xml specified within your
$HADOOP_HOME/conf.
Is your $HADOOP_HOME/conf/mapred-site.xml on the node that runs hive have the
correct value for jobtracker?
If not changing that to the right one might resolve your issue.
Regards
Bejoy KS
Sent from re
Hi
Can you try doing the import again after assigning 'DS12' the default schema
for the user doing the import. Your DB admin should be able to do this in
oracle .
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Raj Hadoop
Date: Sat, 25 May 2013
These are the default, add snappy as well along
io.compression.codecs
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
Fr
Hi
Please find responses below.
Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy
Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
Bejoy : No custom input format required. Add the snappy codec in
io.compr
Go to $HADOOP_HOME/config open and edit core-site.xml
Add a new property 'io.compression.codecs' and assign the required compression
codecs as its value.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Sachin Sudarshana
Date: Thu, 23 May 2013 19
Hi Raj
Which jar depends on what version of oracle you are using? The jar version
corresponding to each oracle release would be there in oracle documentations
online.
JDBC Jars should be available from the oracle website for free download.
Regards
Bejoy KS
Sent from remote device, Please ex
Hi
The procedure is same as setting up mysql metastore. You need to use the jdbc
driver/jar corresponding to the oracle version/release you are intending to use.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Raj Hadoop
Date: Fri, 17 May 2013 1
Hi
Since you are on a pseudo distributed/ single node environment the hadoop
mapreduce parallelism is limited.
You might be having just a few map slots and map tasks might be in queue
waiting for others to complete. In a larger cluster your job should be faster.
As a side note, Certain SQL que
Hi
Since you are on a pseudo distributed/ single node environment the hadoop
mapreduce parallelism is limited.
You might be having just a few map slots and map tasks might be in queue
waiting for others to complete. In a larger cluster your job should be faster.
Certain SQL queries that uliliz
Hi Suresh
AFAIK as of now a partition cannot contain sub directories, it can contain only
files.
You may have to move the sub dirs out of the parent dir 'a' and create separate
partitions for those.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
Fro
Hi Sai
Local mode is just for trials, for any pre prod/production environment you need
MR jobs.
Hive under the hood stores data in HDFS (mostly) and definitely we use
hadoop/hive for larger data volumes. So MR should be in there to process them.
Regards
Bejoy KS
Sent from remote device, Ple
Hi Sai
You can do it as
Select address.country from employees;
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Bennie Schut
Date: Fri, 8 Mar 2013 09:09:49
To: user@hive.apache.org; 'Sai Sai'
Reply-To: user@hive.apache.org
Subject: RE: Accessi
Hi Sachin
You could get the detailed ateps from hive wiki itself
https://cwiki.apache.org/Hive/hiveplugins.html
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Sachin Sudarshana
Date: Fri, 1 Mar 2013 22:37:54
To: ;
Reply-To: user@hive.apache.o
Hi Sachin
AFAIK There isn't one at the moment. But you can easily achieve this using a
custom UDF.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Sachin Sudarshana
Date: Fri, 1 Mar 2013 22:16:37
To:
Reply-To: user@hive.apache.org
Subject: Fin
Hi Cyril
I believe you are using the derby meta store and then it should be an issue
with the hive configs.
Derby is trying to create a metastore at your current dir from where you are
starting hive. The tables exported by sqoop would be inside HIVE_HOME and hence
you are not able to see the t
Hi Austin
AFAIK at the moment you can control permissions gracefully only on a data level
not on the metadata level. ie you can play with the hdfs permissions .
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Austin Chungath
Date: Fri, 22 Feb 20
Hi Sachin
Currently there is no such admin user concept in hive.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Sachin Sudarshana
Date: Fri, 22 Feb 2013 16:40:49
To:
Reply-To: user@hive.apache.org
Subject: Re: Security for Hive
Hi,
I have rea
Hi Gupta
Try out
DESCRIBE EXTENDED FORMATTED
I vaguely recall a operation like this.
Please check hive wiki for the exact syntax.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Chunky Gupta
Date: Thu, 21 Feb 2013 17:15:37
To: ; ;
Reply-To:
Hi Gupta
You can the describe output in a formatted way using
DESCRIBE FORMATTED ;
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Chunky Gupta
Date: Thu, 21 Feb 2013 16:46:30
To:
Reply-To: user@hive.apache.org
Subject: Adding comment to a tab
Hi Hamad
Fully distributed is a proper cluster where all demons are not on the same
machine.
You can have hadoop installed in three modes
- Stand Alone
- Pseudo Distributed (all daemons in same machine) and
- Fully Distributed
Regards
Bejoy KS
Sent from remote device, Please excuse typos
--
Hi
Hive uses the hadoop installation specified in HADOOP_HOME. If your hadoop home
is configured for fully distributed operation it'll utilize the cluster itself.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Hamza Asad
Date: Thu, 21 Feb 2013
Hi Li
The major consideration you should give is regarding the size of bucket. One
bucket corresponds to a file in hdfs and you should ensure that every bucket is
atleast a block size or in the worst case atleast majority of the buckets
should be.
So based on the data size you should derive on
Hi Joseph
There are differences in the following ls commands
cloudera@localhost data]$ hdfs dfs -ls /715
This would list out all the contents in /715 in hdfs, if it is a dir
Found 1 items
-rw-r--r-- 1 cloudera supergroup 7853975 2013-02-14 17:03 /715
The output clearly defines it is file
Hi
In later versions of hive you actually don't need a map joint hint in your
query. Just the following would suffice the purpose
Set hive.auto.convert.join=true
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Mayuresh Kunjir
Date: Fri, 15 Fe
Hi Venkataraman
You can just create an external table and give it location as the hdfs dir
where the data resides.
No need to perform an explicit LOAD operation here.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: venkatramanan
Date: Fri, 25
Hi David,
The default partitioner used in map reduce is the hash partitioner. So based on
your keys they are send to a particular reducer.
May be in your current data set, the keys that have no values in table are all
falling in the same hash bucket and hence being processed by the same reducer
Hi David
An explain extended would give you the exact pointer.
From my understanding, this is how it could work.
You have two tables then two different map reduce job would be processing
those. Based on the join keys, combination of corresponding columns would be
chosen as key from mapper1 a
Hi Ibrahim.
SQOOP is used to import data from rdbms to hbase in your case.
Please get the schema from hbase for your corresponding table and post it here.
We can point out how your mapping could be.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
Fr
Looks like there is a bug with mapjoin + view. Please check hive jira to see if
there an issue open against this else file a new jira.
From my understanding, When you enable map join, hive parser would create back
up jobs. These back up jobs are executed only if map join fails. In normal
cases
Hi Ibrahim
The hive hbase integration totally depends on the hbase table schema and not
the schema of the source table in mysql.
You need to provide the column family qualifier mapping in there.
Get the hbase table's schema from hbase shell.
suppose you have the schema as
Id
CF1.qualifier1
CF1
Hi Santhosh
As long as the smaller table size is in the range of a few MBs. It is a good
candidate for map join.
If the smaller table size is still more then you can take a look at bucketed
map joins.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
F
Sorry, I din understand your query on first look through.
Like Jagat said, you may need to go with a temp table for this.
Do a hadoop fs -cp ../../a.*
Create a external table with location as 'destn dir'.
CREATE EXERNAL TABLE LIKE LOCATION '' ;
NB: I just gave the syntax from memory. please
Hi Oded
If you have created the directories manually that would come visible to the
hive table only if the partitions/ sub dirs are added to the meta data using
'ALTER TABLE ... ADD PARTITION' .
Partitions are not retrieved implicitly into hive tabe even if you have a
proper sub dir structure.
Hi Souvik
To have the new hdfs block size in effect on the already existing files, you
need to re copy them into hdfs.
To play with the number of mappers you can set lesser value like 64mb for min
and max split size.
Mapred.min.split.size and mapred.max.split.size
Regards
Bejoy KS
Sent from
Hi Souvik
Is your input files compressed using some non splittable compression codec?
Do you have enough free slots while this job is running?
Make sure that the job is not running locally.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Original Message-
From: Souvik
Hi Souvik
Apart from hive jobs is the normal mapreduce jobs like the wordcount running
fine on your cluster?
If it is working, for the hive jobs are you seeing anything skeptical in task,
Tasktracker or jobtracker logs?
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Ori
Hi Souvik
In earlier versions of hive you had to give the map join hint. But in later
versions just set hive.auto.convert.join = true;
Hive automatically selects the smaller table. It is better to give the smaller
table as the first one in join.
You can use a map join if you are joining a smal
Hi Rinku
Were you able to create a normal table within your hive without any
issues? By Normal table I mean the one that has data dir in hdfs not in HBase.
Regards
Bejoy K S
From handheld, Please excuse typos.
-Original Message-
From: "Garg, Rinku"
Date: Wed, 29 Feb 2012 05:29:
Hi John
Yes Insert is parallel in default for hive. Hive QL gets transformed to
mapreduce jobs and hence definitely it is parallel. The only case it is not
parallel is when you have just 1 reducer . It is just reading and processing
the input files and in parallel using map reduce jobs fr
Bhavesh
In this case if you are not using INSERT INTO, you may need some tmp
table write the query output to that. Load that data from there to your target
table's data dir.
You are not writing that to any file while doing the LOAD DATA operation.
Rather you are just moving the files(in
Hi Bhavesh
INSERT INTO is supported in hive 0.8 . An upgrade would get you things
rolling.
LOAD DATA inefficient? What was the performance overhead you were facing here?
Regards
Bejoy K S
From handheld, Please excuse typos.
-Original Message-
From: Bhavesh Shah
Date: Wed, 15 Fe
Hi Koert
As you are creating dir/sub dirs using mapreduce jobs out of hive, hive
is unaware of these sub dirs. There is no other way in such cases other than an
add partition DDL to register the dir with a hive partition.
If you are using oozie or shell to trigger your jobs,you can accom
Hi
One of your jar is not available and may be that has the required UDF or
any related methods.
Hive was not able to locate your first jar
'/scripts/hiveMd5.jar does not exist'
Just fix this with the correct location. Everything should work fine.
Regards
Bejoy K S
From handheld, Please
Real Time.. Definitely not hive. Go in for HBase, but don't expect Hbase to be
as flexible as RDBMS. You need to choose your Row Key and Column Families
wisely as per your requirements.
For data mining and analytics you can mount Hive table over corresponding
Hbase table and play on with SQL li
Corrected a few typos in previous mail
Hi Avrila
Hi Avrila
AFAIK the bucketed map join is not default in hive and it happens only
when the configuration parameter hive.optimize.bucketmapjoin is set to true.
You may be getting the same execution plan because hive.optimize.bucketmapjoin
I agree with Matt on that aspect. The solution proposed by me was purely based
on the sample data provided where there were 3 digit comma separated values.
If there are chances of 4 digit values as well in event_list you may need to
revisit the solution.
Regards
Bejoy K S
-Original Messag
Also multiple databases have proved helpful for me in organizing tables into
corresponding databases when you have quite a large number of tables to manage.
Also I believe it'd be helpful in providing access restrictions.
Regards
Bejoy K S
-Original Message-
From: bejoy...@yahoo.com
Da
Ranjith
Hive do support multiple data bases if you are on some of the latest
versions of hive try
Create database testdb;
Use testdb;
It should give you what you are looking for.
Regards
Bejoy K S
-Original Message-
From: "Raghunath, Ranjith"
Date: Thu, 22 Dec 2011 17:02:09
To:
Adithya
The answer is yes. SQOOP is the tool you are looking for. It has an
import option to load data from from any jdbc compliant database into hive. It
even creates the hive table for you by refering to the source db table.
Hope It helps!..
Regards
Bejoy K S
-Original Message-
Looks like some data problem. Were you using the GROUP BY query on same data
set?
But if count(*) also throws an error then it comes to square 1,
installation/configuration problem with hive or map reduce.
Regards
Bejoy K S
-Original Message-
From: Mark Kerzner
Date: Wed, 19 Oct 2011
Mark
To ensure your hive installation is fine run two queries
SELECT * FROM trans LIMIT 10;
SELECT * FROM trans WHERE ***;
You can try this for couple of different tables. If these queries return
results and work fine as desired then your hive could be working good.
If it works good as the s
Hi Mark
What does your Map reduce job logs say? Try figuring out the error form
there. From hive CLI you could hardly find out the root cause of your errors.
From job tracker web UI < http://hostname:50030/jobtracker.jsp> you can easily
browse to failed tasks and get the actual exception fr
Hi Li
AFAIK 0.21 is not really a stable version of hadoop . So if this upgrade
is on a production cluster it'd be better to go in with 0.20.203.
Regards
Bejoy K S
-Original Message-
From: Shouguo Li
Date: Thu, 1 Sep 2011 11:41:46
To:
Reply-To: user@hive.apache.org
Subject: upgrad
Hi Daniel
In the hadoop eco system the number of map tasks is actually decided
by the job basically based no of input splits . Setting mapred.map.tasks
wouldn't assure that only that many number of map tasks are triggered. What
worked out here for you is that you were specifying that a
A small correction to my previous post. The CDH version is CDH u1 not u0
Sorry for the confusion
Regards
Bejoy K S
-Original Message-
From: Bejoy Ks
Date: Thu, 18 Aug 2011 05:51:58
To: hive user group
Reply-To: user@hive.apache.org
Subject: Hive crashing after an upgrade - issue with ex
Ya I very much agree with you on those lines. Using the basic stuff would
literally run into memory issues with large datasets. I had some of those
resolved by using the DISTRIBUTE BY clause and so. In short a little work
around over your hive queries could help you out in some cases.
Regards
B
Hi Daniel
Just having a look at your requirement , to load data into a partition
based hive table from any input file the most hassle free approach would be.
1. Load the data into a non partitioned table that shares similar structure as
the target table.
2. Populate the target table with t
Hi
Hive queries are parsed into hadoop map reduce jobs. In map reduce jobs,
between map and reduce tasks there are two phases, copy-phase and sort-phase
together known as sort and shuffle phase. So the copy task indicated in hive
job here should be the copy phase of map reduce. It does the co
Thanks Amareshwari, the article gave me much valuable hints to decide my
choice. But on curiosity, does hive support stage by stage iterative
processing? If so how?
Thank You
Regards
Bejoy K S
-Original Message-
From: Amareshwari Sri Ramadasu
Date: Mon, 8 Aug 2011 17:14:21
To: user@h
Hi
I've been successful using hive for a past few projects. Now for a
particular use case I'm bit confused what to choose, Hive or Pig. My project
involves a step by step sequential work flow. In every step I retrieve some
values based on some query, use these values as input to new queries
Hi Ayon
AFAIK hive is supposed to behave so. If you set the
hive.cli.print.header=true for enabling column headers then some commands like
'desc' is not expected to work. Not sure whether there is some patch recently
out for this.
Regards
Bejoy K S
-Original Message-
From: Ayon Sin
Hi Travis
From my understanding of your requirement, Dynamic Partitions in hive
is the most suitable solution.
I have written a blogpost on such requirements please refer
http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
for an understanding on the
Hi Jinhang
I don't think hive supports multi character delimiters. The hassle free
option here would be to preprocess the data using mapreduce to replace the
multi character delimiter with another permissible one that suits your data.
Regards
Bejoy K S
-Original Message-
From: jin
Thanks for your reply Viral. However in later versions of hive you don't have
to tell hive anything (which is the smaller table) . During runtime hive itself
identifies the smaller table and do the local map task on the same irrespective
of whether it comes on left or right side of the join. Th
Thanks Yongqiang . I worked for me and I was able to evaluate the performance.
It proved to be expensive :)
Regards
Bejoy K S
-Original Message-
From: yongqiang he
Date: Thu, 31 Mar 2011 22:27:26
To: ;
Reply-To: user@hive.apache.org
Subject: Re: Hive map join - process a little larger
Thanks Yongqiang for your reply. I'm running a hive script which has nearly 10
joins within. From those joins all map joins(9 of them involves one small
table) involving smaller tables are running fine. Just 1 join is on two larger
tables and this map join fails, however since the back up task(c
Try out CDH3b4 it has hive 0.7 and the latest of other hadoop tools. When you
work with open source it is definitely a good practice to upgrade those with
latest versions. With newer versions bugs would be minimal , performance would
be better and you get more functionalities. Your query looks f
74 matches
Mail list logo