follow this link -
http://hadoop.apache.org/common/docs/r0.20.203.0/single_node_setup.html..it
worked for most of us without any prooblem.
do all the things required to configure hadoop on linux in pseudo
distributed mode as given in this link..start with a simple setup as
shown there..then we'll
ok..we'll give it a final shot..then i'll email configured hadoop to
your email address..delete the hdfs directory which contains tmp, data
and name..recreate it..format hdfs again and then start the processes.
Regards,
Mohammad Tariq
On Thu, Jun 7, 2012 at 2:22 AM, Babak Bastan wrote:
> I
actually this blog post explains how to install cloudera's hadoop
distribution...if you have followed this post and installed cloudera's
distribution then your logs should ideally be inside
/usr/lib/hadoop/logs (if everything was fine)..anyway try the steps I
have given and let me know.
Regards,
need not worry.. i am also a student..just keep your calm..start fresh
and follow these steps -
1 - download hadoop from apache using this link -
http://apache.techartifact.com/mirror/hadoop/common/hadoop-0.20.205.0/hadoop-0.20.205.0.tar.gz
2 - untar it - right click+extract here
3 - set JAVA_HO
check your /var/log/hadoop/...also when you do something wrong your
will find your terminal full of many error messages, you can use them
as well..and by the way learning something new requires great deal of
patience
Regards,
Mohammad Tariq
On Thu, Jun 7, 2012 at 1:25 AM, Babak Bastan wrote
go to your HADOOP_HOME i.e your hadoop directory(that includes bin,
conf etc)..you can find logs directory there..
Regards,
Mohammad Tariq
On Thu, Jun 7, 2012 at 1:09 AM, Babak Bastan wrote:
> hoe can I get my log mohammad?
>
>
> On Wed, Jun 6, 2012 at 9:36 PM, Mohammad Tariq wrote:
>>
>>
hoe can I get my log mohammad?
On Wed, Jun 6, 2012 at 9:36 PM, Mohammad Tariq wrote:
> could you post your logs???that would help me in understanding the
> problem properly.
>
> Regards,
> Mohammad Tariq
>
>
> On Thu, Jun 7, 2012 at 1:02 AM, Babak Bastan wrote:
> > Thank you very much moham
could you post your logs???that would help me in understanding the
problem properly.
Regards,
Mohammad Tariq
On Thu, Jun 7, 2012 at 1:02 AM, Babak Bastan wrote:
> Thank you very much mohamad for your attention.I followed the steps but the
> error is the same as the last time.
> and there is
Thank you very much mohamad for your attention.I followed the steps but the
error is the same as the last time.
and there is my hosts file:
127.0.0.1 localhost
#127.0.0.1 ubuntu.ubuntu-domainubuntu
# The following lines are desirable for IPv6 capable hosts
#::1 ip6-localhost
also change the permissions of these directories to 777.
Regards,
Mohammad Tariq
On Wed, Jun 6, 2012 at 11:54 PM, Mohammad Tariq wrote:
> create a directory "/home/username/hdfs" (or at some place of your
> choice)..inside this hdfs directory create three sub directories -
> name, data, and
create a directory "/home/username/hdfs" (or at some place of your
choice)..inside this hdfs directory create three sub directories -
name, data, and temp, then follow these steps :
add following properties in your core-site.xml -
fs.default.name
hdfs://localhost:9000/
thank's Mohammad
with this command:
babak@ubuntu:~/Downloads/hadoop/bin$ hadoop namenode -format
this is my output:
12/06/06 20:05:20 INFO namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/12
But it may payoff by saving on network IO while copying the data during
reduce phase. Though it will vary from case to case. We had good results by
using Snappy codec for compressing map output. Snappy provides reasonably
good compression at faster rate.
Thanks,
Vinod
http://blog.vinodsingh.com/
once we are done with the configuration, we need to format the file
system..use this command to do that-
bin/hadoop namenode -format
after this, hadoop daemon processes should be started using following commands -
bin/start-dfs.sh (it'll start NN & DN)
bin/start-mapred.sh (it'll start JT & TT)
af
*were you able to format hdfs properly???*
I did'nt get your question,Do you mean HADOOP_HOME? or where did I install
Hadoop?
On Wed, Jun 6, 2012 at 7:49 PM, Mohammad Tariq wrote:
> if you are getting only this, it means your hadoop is not
> running..were you able to format hdfs properly???
>
>
if you are getting only this, it means your hadoop is not
running..were you able to format hdfs properly???
Regards,
Mohammad Tariq
On Wed, Jun 6, 2012 at 11:17 PM, Babak Bastan wrote:
> Hi MohammadmI irun jps in my shel I can see this result:
> 2213 Jps
>
>
> On Wed, Jun 6, 2012 at 7:44 PM
Hi MohammadmI irun jps in my shel I can see this result:
2213 Jps
On Wed, Jun 6, 2012 at 7:44 PM, Mohammad Tariq wrote:
> you can also use "jps" command at your shell to see whether Hadoop
> processes are running or not.
>
> Regards,
> Mohammad Tariq
>
>
> On Wed, Jun 6, 2012 at 11:12 PM, M
you can also use "jps" command at your shell to see whether Hadoop
processes are running or not.
Regards,
Mohammad Tariq
On Wed, Jun 6, 2012 at 11:12 PM, Mohammad Tariq wrote:
> Hi Babak,
>
> You have to type it in you web browser..Hadoop provides us a web GUI
> that not only allows us to
Hi Babak,
You have to type it in you web browser..Hadoop provides us a web GUI
that not only allows us to browse through the file system, but to
download the files as well..Apart from that it also provides a web GUI
that can be used to see the status of Jobtracker and Tasktracker..When
you run a
Thank you shashwat for the answer,
where should I type http://localhost:50070?
I typed here: hive>http://localhost:50070 but nothing as result
On Wed, Jun 6, 2012 at 3:32 PM, shashwat shriparv wrote:
> first type http://localhost:50070 whether this is opening or not and
> check how many nodes ar
first type http://localhost:50070 whether this is opening or not and check
how many nodes are available, check some of the hadoop shell commands from
http://hadoop.apache.org/common/docs/r0.18.3/hdfs_shell.html run example
mapreduce task on hadoop take example from here :
http://www.michael-noll.c
One of similar use case which I worked in , the record timestamp is not
guaranteed to arrive in some order. So we used Pig to do some processing
similar to what your custom code is doing and after the records are in
required order of timestamp we push them to hive.
---
Sent from Mobile , s
Hi all,
I'm interested in knowing how everyone is importing their data into
their production Hive clusters.
Let me explain a little more. At the moment, I have log files (which
are divided into 5 minute chunks, per event type (of which there are
around 10), per server (a few 10s) arriving on one
Compression is an overhead when you have a CPU intensive jobDebarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperience certainty. IT Services Business Solutions Outsourcing__
Hi Sreenath
Output compression is more useful on storage level, when a larger file is
compressed it saves on hdfs blocks and there by the cluster become more
scalable in terms of number of files.
Yes lzo libraries needs to be there in all task tracker nodes as well the node
that hosts the h
Hi Bejoy
I would like to make this clear.
There is no gain on processing throughput/time on compressing the data
stored in HDFS (not talking about intermediate compression)...wright??
And do I need to add the lzo libraries in Hadoop_Home/lib/native for all
the nodes (including the slave nodes)??
Hi Sreenath
The default compression codec used in hadoop is
org.apache.hadoop.io.compress.DefaultCodec
To use gzip as compression
mapred.output.compress=truemapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
Regards
Bejoy KS
From: Sree
Hi Sreenath
The lzo error is because you don't have the lzo libraries in
Hadoop_Home/lib/native folder. You need to pack/build lzo for the OS you are
using.
In compression as you mentioned there is an overhead in decompressing while
processing the records. HDFS is used to store large amount of
k...understood...so you load the compressed data into memory (thereby
decreasing the size of file needed to be loaded) and then apply
decompression algorithm to get the uncompressed data. is this what happens?
Thanks for the response.
1)How do I use the Gz compression and does it come with Hadoop. Or else how
do I build a compression method for using in Hive. I would like to run
evaluation across compression methods.
What is the default compression used in Hadoop.
2)Kindly bear with me if this question
There is something you gain and something you loose.
Compression would reduce IO through increased cpu work . Also you would receive
different experience for different tasks ie HDFS read , HDFS write , shuffle
and sort . So to go for compression or not depends on your usages .
Sent from my N8
Basically, when your data is compressed you have lesser IO than your uncompressd data. During job execution is doesn't decompress. It would be a relevant question in Hadoop's mailing list than hive.Debarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.com_
Yes performance is better because your IO is less when your data is lessDebarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperience certainty. IT ServicesBusiness SolutionsOutsourcing_
LZO doesn't ship with apache hadoop you need to build it..try GZDebarshi BasakTata Consultancy ServicesMailto: debarshi.ba...@tcs.comWebsite: http://www.tcs.comExperience certainty. IT ServicesBusiness SolutionsOutsourcing_
Thanks all
All help is greatly appreciated. Pl. feel free to post whatever comes to
your mind.
Learned a lot from this conversation.
Pl. post any findings on this topic : Hive as a warehouse - limitations
Thanks
Hello Rafel,
I assume that you have donwloaded and configured Hadoop successfully. If no
, then please tell i can give steps for that also.
For Hive , the easiest way for me is to download the Hive Tar from Apache
website
Extract the Hive tar into some location
Then i would set environment variab
Hi Mark,
Thanks for all your help. I tried to run a series of test with various
settings of hive.optimize.ppd and various queries ( see it here
http://pastebin.com/E89p9Ubx ) and now I'm even more confused than
before. In all cases, regardless if the WHERE clause asks about
partitioned or regular
37 matches
Mail list logo