Fail to create temporary directory when execute bucket map join

2012-03-19 Thread binhnt22
Hello there, I have 2 tables CREATE TABLE data(calling STRING COMMENT 'Calling number', volumn_download BIGINT COMMENT 'Volume download', volumn_upload BIGINT COMMENT 'Volume upload') PARTITIONED BY(ds STRING) CLUSTERED BY (calling) INTO 100 BUCKETS; CREATE TABLE sub(isdn STRING, sub_id STRI

Re: How to get job names and stages of a query?

2012-03-19 Thread Manish Bhoge
Whenever you submit a Sql a job I'd get generated. You can open the job tracker localhost:50030/jobtracker.asp It shows jobs are running and rest of the other details. Thanks, Manish Sent from my BlackBerry, pls excuse typo -Original Message- From: Felix.徐 Date: Tue, 20 Mar 2012 12:58:53

Re: How to get job names and stages of a query?

2012-03-19 Thread Manish Bhoge
Whenever you submit a Sql a job I'd get generated. You can open the job tracker localhost:50030/jobtracker.asp It shows jobs are running and rest of the other details. Thanks, Manish Sent from my BlackBerry, pls excuse typo -Original Message- From: Felix.徐 Date: Tue, 20 Mar 2012 12:58:53

Re: Hive performance vs. SQL?

2012-03-19 Thread Keith Wiley
Thanks for the response. Cheers! On Mar 19, 2012, at 16:42 , Maxime Brugidou wrote: > From my experience, if you can fit data in a SQL without sharding or > anything, don't ever think twice. Hive is not even comparable. _

LOAD DATA problem

2012-03-19 Thread Sean McNamara
Is there a way to prevent LOAD DATA LOCAL INPATH from appending _copy_1 to logs that already exist in a partition? If the log is already in hdfs/hive I'd rather it fail and give me an return code or output saying that the log already exists. For example, if I run these queries: /usr/local/hive

Re: Hive performance vs. SQL?

2012-03-19 Thread Maxime Brugidou
>From my experience, if you can fit data in a SQL without sharding or anything, don't ever think twice. Hive is not even comparable. I would rather say that Hive is a nice SQL interface over Hadoop M/R rather than any SQL replacement. If you are running a DWH in SQL and you don't need to grow your

Hive performance vs. SQL?

2012-03-19 Thread Keith Wiley
I haven't had an opportunity to set up a huge Hive database yet because exporting csv files from our SQL database is, in itself, a rather laborious task. I was just curious how I might expect Hive to perform vs. SQL on large databases and large queries? I realize Hive is pretty "latent" since

Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Edward Capriolo
I am not trying to knock oozie but MapReduce Action: Would be great but hadoop docs taught me the proper way to write hadoop programs was Tool and Configured. 90% of our legacy jobs are tools. MapReduce action can not launch Tools. So JavaMain... SSH action is something I would never allow on

Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Alejandro Abdelnur
Eduardo, Beside the mapreduce/streaming/hive/pig/sqoop/distcp action, Oozie has a JAVA action (to execute a Java Main class in the cluster), a SSH action (to execute a script via SSH in a remote host), and a SHELL action (to execute a script in the cluster). Would you mind explaining what does yo

Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Edward Capriolo
This is a bit of a problem. ozzie is great for workflow scheduling but oozie does not have "actions" for everything and adding actions is non-trivial in current versions. I have created some "bootleg/generic" oozie actions that make it easy to exec pretty much anything and treat it as an action.

RE: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread carla.staeben
Great topic as I was wondering a similar thing this morning...I want to use oozie to execute my hive job, but I have to pass the job parameters that I generate with a shell script. Some of the literature that I've seen says that oozie may or may not allow for calling shell scripts. Is that tru

Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Bejoy Ks
Hi LakshmiKanth         In production systems if you have a sequence of command to be executed pack them in order in a file. Then execute the command as hive -f ; For simplicity, you can use a cron job to run it in a scheduled manner. Just give this command in a .sh file call the file in cron.

Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread LakshmiKanth P
Hi I need to schedule my hive scripts which needs to process incoming weblogs on an hourly basis. Currently, I could process my weblog files by executing my scripts from hive command line interface. Now I want to keep my scripts in a file and invoke my scripts at a regular periods of interval.

Re: how is number of mappers determined in mapside join?

2012-03-19 Thread Bejoy Ks
Hi Bruce       From my understanding, that formula is not for CombineFileInputFormat but for other basic Input Formats. I'd just brief you on CombineFileInputFormat to get things more clear.       In the default TextInputFormat every hdfs block is processed by a mapper. But if the files are smal

Re: how is number of mappers determined in mapside join?

2012-03-19 Thread Bruce Bian
Hi Bejoy, Thanks for your reply. The function is from the book, Hadoop The Definitive Guide 2nd edition. On page 203 there is "The split size is calculated by the formula (see the computeSplitSize() method in FileInputFormat): max(minimumSize, min(maximumSize, blockSize)) by default:minimumSize < b

Re: how is number of mappers determined in mapside join?

2012-03-19 Thread Bejoy Ks
Hi Bruce       In map side join the smaller table is loader in memory and hence the number of mappers is dependent only on the data on larger table. Say If CombineHiveInputFormat is used and we have our hdfs block size as 32 mb, min split size as 1B and max split size as 256 mb. Which means one

how is number of mappers determined in mapside join?

2012-03-19 Thread Bruce Bian
Hi there, when I'm executing the following queries in hive set hive.auto.convert.join = true; CREATE TABLE IDAP_ROOT as SELECT a.*,b.acnt_no FROM idap_pi_root a LEFT OUTER JOIN idap_pi_root_acnt b ON a.acnt_id=b.acnt_id the number of mappers to run in the mapside join is 3, how is it determined?

Trying jdbc:hive client

2012-03-19 Thread shashwat shriparv
I am trying hive using java jdbc client, i can execute simple queries like select * from table and select * from table where someting="someting" but when i am going for join queries it throwing me the following error: * In my Netbean ide code this is the exception*: Running: SELECT * FROM samplet