Re: help on failed MR jobs (big hive files)

2012-12-12 Thread Mark Grover
Elaine, Nitin raises some good points. Continuing on the same lines, let's take a closer look at the query: insert overwrite table B select a, b, c from table A where datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))), request_date) <= 30 In the above query, "datediff(to_date(from_unix

Re: help on failed MR jobs (big hive files)

2012-12-12 Thread Nitin Pawar
6GB size is nothing. We have done it with few TB of data in hive. Error you are seeing is on the hadoop side. You can always optimize your query based on the hadoop compute capacity you have got and also based on the pattern in the data you will need to design your schema. The problem here can be

Re: REST API for Hive queries?

2012-12-12 Thread Nitin Pawar
Hive takes a longer time to respond to queries as the data gets larger. Best way to handle this is you process the data on hive and store in some rdbms like mysql etc. On top of that then you can write your own API or use pentaho like interface where they can write the queries or see predefined re

Re: map side join with group by

2012-12-12 Thread Nitin Pawar
I think Chen wanted to know why this is two phased query if I understood it correctly When you run a mapside join .. it just performs the join query .. after that to execute the group by part it launches the second job. I may be wrong but this is how I saw it whenever I executed group by queries

Re: Array index support non-constant expresssion

2012-12-12 Thread Navis류승우
Different error messages but seemed from same problem. Could you do that with later versions of hive? I think these kind of bugs are fixed. 2012/12/13 java8964 java8964 : > ExprNodeGenericFuncEvaluator

Re: Hive Thrift upgrade to 0.9.0

2012-12-12 Thread Shreepadma Venugopalan
On Tue, Dec 11, 2012 at 12:07 PM, Shangzhong zhu wrote: > We are using Hive 0.9.0, and we have seen frequent Thrift Metastore > timeout issues probably due to the Thrift memory leak reported in > THRIFT-1468. > > The current solution is to upgrade Thrift to 0.9.0 > > I am trying to use the patch (

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964
Hi, Navis: If I disable both CP/PPD, it will be worse, as neither 1) or 2) query works. But interested thing is that for both queries, I got same error message, but different one comparing with my original error message: 2012-12-12 20:36:21,362 WARN org.apache.hadoop.mapred.Child: Error running

Re: map side join with group by

2012-12-12 Thread Mark Grover
Hi Chen, I think we would need some more information. The query is referring to a table called "d" in the MAPJOIN hint but there is not such table in the query. Moreover, Map joins only make sense when the right table is the one being "mapped" (in other words, being kept in memory) in case of a Le

Re: Array index support non-constant expresssion

2012-12-12 Thread Navis류승우
Could you try it with CP/PPD disabled? set hive.optimize.cp=false; set hive.optimize.ppd=false; 2012/12/13 java8964 java8964 : > Hi, > > I played my query further, and found out it is very puzzle to explain the > following behaviors: > > 1) The following query works: > > select c_poi.provider_str

map side join with group by

2012-12-12 Thread Chen Song
I have a silly question on how Hive interpretes a simple query with both map side join and group by. Below query will translate into two jobs, with the 1st one as a map only job doing the join and storing the output in a intermediary location, and the 2nd one as a map-reduce job taking the output

Re: Map side join

2012-12-12 Thread Souvik Banerjee
Hi Bejoy, Yes I ran the pi example. It was fine. Regarding the HIVE Job what I found is that it took 4 hrs for the first map job to get completed. Those map tasks were doing their job and only reported status after completion. It is indeed taking too long time to finish. Nothing I could find relev

REST API for Hive queries?

2012-12-12 Thread Leena Gupta
Hi, We are using Hive as our data warehouse to run various queries on large amounts of data. There are some users who would like to get access to the output of these queries and display the data on an existing UI application. What is the best way to give them the output of these queries? Should we

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964
Hi, I played my query further, and found out it is very puzzle to explain the following behaviors: 1) The following query works: select c_poi.provider_str, c_poi.name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a I g

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964
OK. I followed the hive source code of org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the UDF. It is quite simple. It works fine as I expected for simple case, but when I try to run it under some complex query, the hive MR jobs failed with some strange errors. What I

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
have you a page in which you explain the steps. 2012/12/12 Mohammad Tariq > Hi Imen, > > I am sorry, I didn't get the question. Are you asking about > creating a distributed cluster? Yeah, I have done that. > > Regards, > Mohammad Tariq > > > > On Wed, Dec 12, 2012 at 7:45 PM, ime

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Hi Imen, I am sorry, I didn't get the question. Are you asking about creating a distributed cluster? Yeah, I have done that. Regards, Mohammad Tariq On Wed, Dec 12, 2012 at 7:45 PM, imen Megdiche wrote: > have you please commented the configuration of hadoop on cluster > > thanks

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
have you please commented the configuration of hadoop on cluster thanks 2012/12/12 Mohammad Tariq > You are always welcome. If you still need any help, you can go here : > http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html > I have outlined the entire process here along with fe

Re: Map side join

2012-12-12 Thread bejoy_ks
Hi Souvik Apart from hive jobs is the normal mapreduce jobs like the wordcount running fine on your cluster? If it is working, for the hive jobs are you seeing anything skeptical in task, Tasktracker or jobtracker logs? Regards Bejoy KS Sent from remote device, Please excuse typos -Ori

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
thank you very much you re awsome. Fixed 2012/12/12 Mohammad Tariq > Uncomment the property in core-site.xml. That is a must. After doing this > you have to restart the daemons? > > Regards, > Mohammad Tariq > > > > On Wed, Dec 12, 2012 at 7:08 PM, imen Megdiche wrote: > >> I changed the

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
I wonder how you are able to run the job without a JT. You must have this on your mapred-site.xml file : mapred.job.tracker localhost:9001 Also add "hadoop.tmp.dir" in core-site.xml, and "dfs.name.dir" & "dfs.data.dir" in hdfs-site.xml. Regards, Mohammad Tariq On Wed, Dec 12, 20

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
For mapred-site.xml : mapred.map.tasks 6 for core-site.xml : on hdfs-site.xml nothing 2012/12/12 Mohammad Tariq > Can I have a look at your config files? > > Regards, > Mohammad Tariq > > > > On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche wrote: > >> i run the start-all.s

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Can I have a look at your config files? Regards, Mohammad Tariq On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche wrote: > i run the start-all.sh and all daemons starts without problems. But i the > log of the tasktracker look like this : > > > 2012-12-12 13:53:45,495 INFO org.apache.hadoop.m

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
i run the start-all.sh and all daemons starts without problems. But i the log of the tasktracker look like this : 2012-12-12 13:53:45,495 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG: / STARTUP_MSG: Starting TaskTracker STARTUP

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
I would check if all the daemons are running properly or not, before anything else. If some problem is found, next place to track is the log of each daemon. The correct command to check the status of a job from command line is : hadoop job -status jobID. (Mind the 'space' after job and remove 'com

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
My goal is to analyze the response time of MapReduce depending on the size of the input files. I need to change the number of map and / or Reduce tasks and recover the execution time. S it turns out that nothing works locally on my pc : neither hadoop job-status command job_local_0001 (which return

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Are you working locally?What exactly is the issue? Regards, Mohammad Tariq On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche wrote: > no > > > 2012/12/12 Mohammad Tariq > >> Any luck with "localhost:50030"?? >> >> Regards, >> Mohammad Tariq >> >> >> >> On Wed, Dec 12, 2012 at 5:53 PM, im

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
no 2012/12/12 Mohammad Tariq > Any luck with "localhost:50030"?? > > Regards, > Mohammad Tariq > > > > On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche wrote: > >> i run the job through the command line >> >> >> 2012/12/12 Mohammad Tariq >> >>> You have to replace "JobTrackerHost" in "JobTra

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Any luck with "localhost:50030"?? Regards, Mohammad Tariq On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche wrote: > i run the job through the command line > > > 2012/12/12 Mohammad Tariq > >> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the >> actual name of the machi

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
i run the job through the command line 2012/12/12 Mohammad Tariq > You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the > actual name of the machine where JobTracker is running. For example, If > you are working on a local cluster, you have to use "localhost:50030". > > Are y

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the actual name of the machine where JobTracker is running. For example, If you are working on a local cluster, you have to use "localhost:50030". Are you running your job through the command line or some IDE? Regards, Mohamm

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
excuse me the data size is 98 MB 2012/12/12 imen Megdiche > the size of data 49 MB and n of map 4 > the web UI JobTrackerHost:50030 does not wok, what should i do to make > this appear , i work on ubuntu > > > 2012/12/12 Mohammad Tariq > >> Hi Imen, >> >> You can visit the MR web UI at "J

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
the size of data 49 MB and n of map 4 the web UI JobTrackerHost:50030 does not wok, what should i do to make this appear , i work on ubuntu 2012/12/12 Mohammad Tariq > Hi Imen, > > You can visit the MR web UI at "JobTrackerHost:50030" and see all the > useful information like no. of mapper

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Hi Imen, You can visit the MR web UI at "JobTrackerHost:50030" and see all the useful information like no. of mappers, no of reducers, time taken for the execution etc. One quick question for you, what is the size of your data and what is the no of maps which you are getting right now? Reg

Re: Modify the number of map tasks

2012-12-12 Thread imen Megdiche
Thank you Mohammad but the number of map tasks still the same in the execution. Do you know how to capture the time spent on execution. 2012/12/12 Mohammad Tariq > Hi Imen, > > You can add "mapred.map.tasks" property in your mapred-site.xml file. > > But, it is just a hint for the InputForm

Re: Modify the number of map tasks

2012-12-12 Thread Mohammad Tariq
Hi Imen, You can add "mapred.map.tasks" property in your mapred-site.xml file. But, it is just a hint for the InputFormat. Actually no. of maps is actually determined by the no of InputSplits created by the InputFormat. HTH Regards, Mohammad Tariq On Wed, Dec 12, 2012 at 4:11 PM, ime

Re: Re: Number of mapreduce job and the time spent

2012-12-12 Thread imen Megdiche
the command hadoop job -status work fine but the problem that it cannot find the job Could not find job job_local_0001 i don t understand why does it not find it. 2012/12/12 long > Sorry for my mistake. > if $HADOOP_HOME is set, run as follow, or not just find the path for your > 'hadoop' comma

Re:Re: Number of mapreduce job and the time spent

2012-12-12 Thread long
Sorry for my mistake. if $HADOOP_HOME is set, run as follow, or not just find the path for your 'hadoop' command for instead: $HADOOP_HOME/bin/hadoop job -status job_xxx -- Best Regards, longmans At 2012-12-12 17:56:45,"imen Megdiche" wrote: I think that my job id is in this line : 12/12/1

Re: Number of mapreduce job and the time spent

2012-12-12 Thread imen Megdiche
I think that my job id is in this line : 12/12/12 10:43:00 INFO mapred.JobClient: Running job: job_local_0001 but i have this response when i execute : hadoop job -status job_local_0001 Warning: $HADOOP_HOME is deprecated. Could not find job job_local_0001 2012/12/12 long > get you job

Re:Number of mapreduce job and the time spent

2012-12-12 Thread long
get you jobid and use this command: $HADOOP_HOME/hadoop job -status job_xxx -- Best Regards, longmans At 2012-12-12 17:23:39,"imen Megdiche" wrote: Hi, I want to know from the output of the execution of the example of mapreduce wordcount on hadoop : the number of mapreduce job and the ti

help on failed MR jobs (big hive files)

2012-12-12 Thread Elaine Gan
Hi, I'm trying to run a program on Hadoop. [Input] tsv file My program does the following. (1) Load tsv into hive load data local inpath 'tsvfile' overwrite into table A partitioned by xx (2) insert overwrite table B select a, b, c from table A where datediff(to_date(from_unixtime(unix_ti