Elaine,
Nitin raises some good points.
Continuing on the same lines, let's take a closer look at the query:
insert overwrite table B select a, b, c from table A where
datediff(to_date(from_unixtime(unix_timestamp('${logdate}'))),
request_date) <= 30
In the above query,
"datediff(to_date(from_unix
6GB size is nothing. We have done it with few TB of data in hive.
Error you are seeing is on the hadoop side.
You can always optimize your query based on the hadoop compute capacity you
have got and also based on the pattern in the data you will need to design
your schema.
The problem here can be
Hive takes a longer time to respond to queries as the data gets larger.
Best way to handle this is you process the data on hive and store in some
rdbms like mysql etc.
On top of that then you can write your own API or use pentaho like
interface where they can write the queries or see predefined re
I think Chen wanted to know why this is two phased query if I understood it
correctly
When you run a mapside join .. it just performs the join query .. after
that to execute the group by part it launches the second job.
I may be wrong but this is how I saw it whenever I executed group by
queries
Different error messages but seemed from same problem.
Could you do that with later versions of hive? I think these kind of
bugs are fixed.
2012/12/13 java8964 java8964 :
> ExprNodeGenericFuncEvaluator
On Tue, Dec 11, 2012 at 12:07 PM, Shangzhong zhu wrote:
> We are using Hive 0.9.0, and we have seen frequent Thrift Metastore
> timeout issues probably due to the Thrift memory leak reported in
> THRIFT-1468.
>
> The current solution is to upgrade Thrift to 0.9.0
>
> I am trying to use the patch (
Hi, Navis:
If I disable both CP/PPD, it will be worse, as neither 1) or 2) query works.
But interested thing is that for both queries, I got same error message, but
different one comparing with my original error message:
2012-12-12 20:36:21,362 WARN org.apache.hadoop.mapred.Child: Error running
Hi Chen,
I think we would need some more information.
The query is referring to a table called "d" in the MAPJOIN hint but
there is not such table in the query. Moreover, Map joins only make
sense when the right table is the one being "mapped" (in other words,
being kept in memory) in case of a Le
Could you try it with CP/PPD disabled?
set hive.optimize.cp=false;
set hive.optimize.ppd=false;
2012/12/13 java8964 java8964 :
> Hi,
>
> I played my query further, and found out it is very puzzle to explain the
> following behaviors:
>
> 1) The following query works:
>
> select c_poi.provider_str
I have a silly question on how Hive interpretes a simple query with both
map side join and group by.
Below query will translate into two jobs, with the 1st one as a map only
job doing the join and storing the output in a intermediary location, and
the 2nd one as a map-reduce job taking the output
Hi Bejoy,
Yes I ran the pi example. It was fine.
Regarding the HIVE Job what I found is that it took 4 hrs for the first map
job to get completed.
Those map tasks were doing their job and only reported status after
completion. It is indeed taking too long time to finish. Nothing I could
find relev
Hi,
We are using Hive as our data warehouse to run various queries on large
amounts of data. There are some users who would like to get access to the
output of these queries and display the data on an existing UI application.
What is the best way to give them the output of these queries? Should we
Hi,
I played my query further, and found out it is very puzzle to explain the
following behaviors:
1) The following query works:
select c_poi.provider_str, c_poi.name from (select darray(search_results,
c.rank) as c_poi from nulf_search lateral view explode(search_clicks)
clickTable as c) a
I g
OK.
I followed the hive source code of
org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the
UDF. It is quite simple.
It works fine as I expected for simple case, but when I try to run it under
some complex query, the hive MR jobs failed with some strange errors. What I
have you a page in which you explain the steps.
2012/12/12 Mohammad Tariq
> Hi Imen,
>
> I am sorry, I didn't get the question. Are you asking about
> creating a distributed cluster? Yeah, I have done that.
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 7:45 PM, ime
Hi Imen,
I am sorry, I didn't get the question. Are you asking about
creating a distributed cluster? Yeah, I have done that.
Regards,
Mohammad Tariq
On Wed, Dec 12, 2012 at 7:45 PM, imen Megdiche wrote:
> have you please commented the configuration of hadoop on cluster
>
> thanks
have you please commented the configuration of hadoop on cluster
thanks
2012/12/12 Mohammad Tariq
> You are always welcome. If you still need any help, you can go here :
> http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html
> I have outlined the entire process here along with fe
Hi Souvik
Apart from hive jobs is the normal mapreduce jobs like the wordcount running
fine on your cluster?
If it is working, for the hive jobs are you seeing anything skeptical in task,
Tasktracker or jobtracker logs?
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-Ori
thank you very much you re awsome.
Fixed
2012/12/12 Mohammad Tariq
> Uncomment the property in core-site.xml. That is a must. After doing this
> you have to restart the daemons?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 7:08 PM, imen Megdiche wrote:
>
>> I changed the
I wonder how you are able to run the job without a JT. You must have this
on your mapred-site.xml file :
mapred.job.tracker
localhost:9001
Also add "hadoop.tmp.dir" in core-site.xml, and "dfs.name.dir" &
"dfs.data.dir" in hdfs-site.xml.
Regards,
Mohammad Tariq
On Wed, Dec 12, 20
For mapred-site.xml :
mapred.map.tasks
6
for core-site.xml :
on hdfs-site.xml nothing
2012/12/12 Mohammad Tariq
> Can I have a look at your config files?
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche wrote:
>
>> i run the start-all.s
Can I have a look at your config files?
Regards,
Mohammad Tariq
On Wed, Dec 12, 2012 at 6:31 PM, imen Megdiche wrote:
> i run the start-all.sh and all daemons starts without problems. But i the
> log of the tasktracker look like this :
>
>
> 2012-12-12 13:53:45,495 INFO org.apache.hadoop.m
i run the start-all.sh and all daemons starts without problems. But i the
log of the tasktracker look like this :
2012-12-12 13:53:45,495 INFO org.apache.hadoop.mapred.TaskTracker:
STARTUP_MSG:
/
STARTUP_MSG: Starting TaskTracker
STARTUP
I would check if all the daemons are running properly or not, before
anything else. If some problem is found, next place to track is the log of
each daemon.
The correct command to check the status of a job from command line is :
hadoop job -status jobID.
(Mind the 'space' after job and remove 'com
My goal is to analyze the response time of MapReduce depending on the size
of the input files. I need to change the number of map and / or Reduce tasks
and recover the execution time. S it turns out that nothing works locally on my
pc :
neither hadoop job-status command job_local_0001 (which return
Are you working locally?What exactly is the issue?
Regards,
Mohammad Tariq
On Wed, Dec 12, 2012 at 6:00 PM, imen Megdiche wrote:
> no
>
>
> 2012/12/12 Mohammad Tariq
>
>> Any luck with "localhost:50030"??
>>
>> Regards,
>> Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 12, 2012 at 5:53 PM, im
no
2012/12/12 Mohammad Tariq
> Any luck with "localhost:50030"??
>
> Regards,
> Mohammad Tariq
>
>
>
> On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche wrote:
>
>> i run the job through the command line
>>
>>
>> 2012/12/12 Mohammad Tariq
>>
>>> You have to replace "JobTrackerHost" in "JobTra
Any luck with "localhost:50030"??
Regards,
Mohammad Tariq
On Wed, Dec 12, 2012 at 5:53 PM, imen Megdiche wrote:
> i run the job through the command line
>
>
> 2012/12/12 Mohammad Tariq
>
>> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
>> actual name of the machi
i run the job through the command line
2012/12/12 Mohammad Tariq
> You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
> actual name of the machine where JobTracker is running. For example, If
> you are working on a local cluster, you have to use "localhost:50030".
>
> Are y
You have to replace "JobTrackerHost" in "JobTrackerHost:50030" with the
actual name of the machine where JobTracker is running. For example, If you
are working on a local cluster, you have to use "localhost:50030".
Are you running your job through the command line or some IDE?
Regards,
Mohamm
excuse me the data size is 98 MB
2012/12/12 imen Megdiche
> the size of data 49 MB and n of map 4
> the web UI JobTrackerHost:50030 does not wok, what should i do to make
> this appear , i work on ubuntu
>
>
> 2012/12/12 Mohammad Tariq
>
>> Hi Imen,
>>
>> You can visit the MR web UI at "J
the size of data 49 MB and n of map 4
the web UI JobTrackerHost:50030 does not wok, what should i do to make this
appear , i work on ubuntu
2012/12/12 Mohammad Tariq
> Hi Imen,
>
> You can visit the MR web UI at "JobTrackerHost:50030" and see all the
> useful information like no. of mapper
Hi Imen,
You can visit the MR web UI at "JobTrackerHost:50030" and see all the
useful information like no. of mappers, no of reducers, time taken for the
execution etc.
One quick question for you, what is the size of your data and what is the
no of maps which you are getting right now?
Reg
Thank you Mohammad but the number of map tasks still the same in the
execution. Do you know how to capture the time spent on execution.
2012/12/12 Mohammad Tariq
> Hi Imen,
>
> You can add "mapred.map.tasks" property in your mapred-site.xml file.
>
> But, it is just a hint for the InputForm
Hi Imen,
You can add "mapred.map.tasks" property in your mapred-site.xml file.
But, it is just a hint for the InputFormat. Actually no. of maps is
actually determined by the no of InputSplits created by the InputFormat.
HTH
Regards,
Mohammad Tariq
On Wed, Dec 12, 2012 at 4:11 PM, ime
the command hadoop job -status work fine but the problem that it cannot
find the job
Could not find job job_local_0001
i don t understand why does it not find it.
2012/12/12 long
> Sorry for my mistake.
> if $HADOOP_HOME is set, run as follow, or not just find the path for your
> 'hadoop' comma
Sorry for my mistake.
if $HADOOP_HOME is set, run as follow, or not just find the path for your
'hadoop' command for instead:
$HADOOP_HOME/bin/hadoop job -status job_xxx
--
Best Regards,
longmans
At 2012-12-12 17:56:45,"imen Megdiche" wrote:
I think that my job id is in this line :
12/12/1
I think that my job id is in this line :
12/12/12 10:43:00 INFO mapred.JobClient: Running job: job_local_0001
but i have this response when i execute :
hadoop job -status job_local_0001
Warning: $HADOOP_HOME is deprecated.
Could not find job job_local_0001
2012/12/12 long
> get you job
get you jobid and use this command:
$HADOOP_HOME/hadoop job -status job_xxx
--
Best Regards,
longmans
At 2012-12-12 17:23:39,"imen Megdiche" wrote:
Hi,
I want to know from the output of the execution of the example of mapreduce
wordcount on hadoop : the number of mapreduce job and the ti
Hi,
I'm trying to run a program on Hadoop.
[Input] tsv file
My program does the following.
(1) Load tsv into hive
load data local inpath 'tsvfile' overwrite into table A partitioned by xx
(2) insert overwrite table B select a, b, c from table A where
datediff(to_date(from_unixtime(unix_ti
40 matches
Mail list logo