Hello there,
I have 2 tables
CREATE TABLE data(calling STRING COMMENT 'Calling number',
volumn_download BIGINT COMMENT 'Volume download',
volumn_upload BIGINT COMMENT 'Volume upload')
PARTITIONED BY(ds STRING)
CLUSTERED BY (calling) INTO 100 BUCKETS;
CREATE TABLE sub(isdn STRING, sub_id STRI
Whenever you submit a Sql a job I'd get generated. You can open the job tracker
localhost:50030/jobtracker.asp
It shows jobs are running and rest of the other details.
Thanks,
Manish
Sent from my BlackBerry, pls excuse typo
-Original Message-
From: Felix.徐
Date: Tue, 20 Mar 2012 12:58:53
Whenever you submit a Sql a job I'd get generated. You can open the job tracker
localhost:50030/jobtracker.asp
It shows jobs are running and rest of the other details.
Thanks,
Manish
Sent from my BlackBerry, pls excuse typo
-Original Message-
From: Felix.徐
Date: Tue, 20 Mar 2012 12:58:53
Thanks for the response.
Cheers!
On Mar 19, 2012, at 16:42 , Maxime Brugidou wrote:
> From my experience, if you can fit data in a SQL without sharding or
> anything, don't ever think twice. Hive is not even comparable.
_
Is there a way to prevent LOAD DATA LOCAL INPATH from appending _copy_1 to logs
that already exist in a partition? If the log is already in hdfs/hive I'd
rather it fail and give me an return code or output saying that the log already
exists.
For example, if I run these queries:
/usr/local/hive
>From my experience, if you can fit data in a SQL without sharding or
anything, don't ever think twice. Hive is not even comparable.
I would rather say that Hive is a nice SQL interface over Hadoop M/R rather
than any SQL replacement. If you are running a DWH in SQL and you don't
need to grow your
I haven't had an opportunity to set up a huge Hive database yet because
exporting csv files from our SQL database is, in itself, a rather laborious
task. I was just curious how I might expect Hive to perform vs. SQL on large
databases and large queries? I realize Hive is pretty "latent" since
I am not trying to knock oozie but
MapReduce Action: Would be great but hadoop docs taught me the proper
way to write hadoop programs was Tool and Configured. 90% of our
legacy jobs are tools. MapReduce action can not launch Tools. So
JavaMain...
SSH action is something I would never allow on
Eduardo,
Beside the mapreduce/streaming/hive/pig/sqoop/distcp action, Oozie has a
JAVA action (to execute a Java Main class in the cluster), a SSH action (to
execute a script via SSH in a remote host), and a SHELL action (to execute
a script in the cluster).
Would you mind explaining what does yo
This is a bit of a problem. ozzie is great for workflow scheduling but
oozie does not have "actions" for everything and adding actions is
non-trivial in current versions.
I have created some "bootleg/generic" oozie actions that make it easy
to exec pretty much anything and treat it as an action.
Great topic as I was wondering a similar thing this morning...I want to use
oozie to execute my hive job, but I have to pass the job parameters that I
generate with a shell script. Some of the literature that I've seen says that
oozie may or may not allow for calling shell scripts. Is that tru
Hi LakshmiKanth
In production systems if you have a sequence of command to be executed
pack them in order in a file. Then execute the command as
hive -f ;
For simplicity, you can use a cron job to run it in a scheduled manner. Just
give this command in a .sh file call the file in cron.
Hi
I need to schedule my hive scripts which needs to process incoming weblogs
on an hourly basis.
Currently, I could process my weblog files by executing my scripts from
hive command line interface. Now I want to keep my scripts in a file and
invoke my scripts at a regular periods of interval.
Hi Bruce
From my understanding, that formula is not for CombineFileInputFormat but
for other basic Input Formats.
I'd just brief you on CombineFileInputFormat to get things more clear.
In the default TextInputFormat every hdfs block is processed by a mapper.
But if the files are smal
Hi Bejoy,
Thanks for your reply.
The function is from the book, Hadoop The Definitive Guide 2nd edition. On
page 203 there is
"The split size is calculated by the formula (see the computeSplitSize()
method in FileInputFormat): max(minimumSize, min(maximumSize, blockSize))
by default:minimumSize < b
Hi Bruce
In map side join the smaller table is loader in memory and hence the
number of mappers is dependent only on the data on larger table. Say If
CombineHiveInputFormat is used and we have our hdfs block size as 32 mb, min
split size as 1B and max split size as 256 mb. Which means one
Hi there,
when I'm executing the following queries in hive
set hive.auto.convert.join = true;
CREATE TABLE IDAP_ROOT as
SELECT a.*,b.acnt_no
FROM idap_pi_root a LEFT OUTER JOIN idap_pi_root_acnt b ON
a.acnt_id=b.acnt_id
the number of mappers to run in the mapside join is 3, how is it
determined?
I am trying hive using java jdbc client, i can execute simple queries like
select * from table and select * from table where someting="someting" but
when i am going for join queries it throwing me the following error:
*
In my Netbean ide code this is the exception*:
Running: SELECT * FROM samplet
18 matches
Mail list logo