python streaming error
Hi, When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok. 1 #!/bin/env python 2 3 import sys 4 5 for line in sys.stdin: 6 offset,filename = line.split("\t") 7 file = "hdfs://user/hdfs/catalog3/" + filename 8 print line 9 print filename 10 print file 11 file_obj = open(file) ..
Re:Re: python streaming error
hi, I modify the file as below, there is still error 1 #!/bin/env python 2 3 import sys 4 5 for line in sys.stdin: 6 offset,filename = line.split("\t") 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename 8 print line 9 print filename 10 print file 11 file_obj = open(file) At 2013-01-12 16:34:37,"Nitin Pawar" wrote: >is this correct path for writing onto hdfs? > >"hdfs://user/hdfs/catalog3." > >I don't see the namenode info in the path. Can this cause any issue. Just >making an guess >something like hdfs://host:port/path > >On Sat, Jan 12, 2013 at 12:30 AM, springring wrote: > >> hdfs://user/hdfs/catalog3/ > > > > > >-- >Nitin Pawar
Re:Re: Re: python streaming error
hi, I find the key point, not the hostname, it is right. just chang "offset,filename = line.split("\t")" to "offset,filename = line.strip().split("\t")" now it pass At 2013-01-12 16:58:29,"Nitin Pawar" wrote: >computedb-13 is not a valid host name > >may be if you have local hadoop then you can name refer it with >hdfs://localhost:9100/ or hdfs://127.0.0.1:9100 > >if its on other machine then just try with IP address of that machine > > >On Sat, Jan 12, 2013 at 12:55 AM, springring wrote: > >> hi, >> >> I modify the file as below, there is still error >> >> 1 #!/bin/env python >> 2 >> 3 import sys >> 4 >> 5 for line in sys.stdin: >> 6 offset,filename = line.split("\t") >> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename >> 8 print line >> 9 print filename >> 10 print file >> 11 file_obj = open(file) >> >> >> >> >> >> >> >> >> >> At 2013-01-12 16:34:37,"Nitin Pawar" wrote: >> >is this correct path for writing onto hdfs? >> > >> >"hdfs://user/hdfs/catalog3." >> > >> >I don't see the namenode info in the path. Can this cause any issue. Just >> >making an guess >> >something like hdfs://host:port/path >> > >> >On Sat, Jan 12, 2013 at 12:30 AM, springring wrote: >> > >> >> hdfs://user/hdfs/catalog3/ >> > >> > >> > >> > >> > >> >-- >> >Nitin Pawar >> > > > >-- >Nitin Pawar
Re:Re:Re: Re: python streaming error
sorry the error keep on, even when i modify the code "offset,filename = line.strip().split("\t")" At 2013-01-14 09:27:10,springring wrote: >hi, > I find the key point, not the hostname, it is right. >just chang "offset,filename = line.split("\t")" to >"offset,filename = line.strip().split("\t")" >now it pass > > > > > > > >At 2013-01-12 16:58:29,"Nitin Pawar" wrote: >>computedb-13 is not a valid host name >> >>may be if you have local hadoop then you can name refer it with >>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100 >> >>if its on other machine then just try with IP address of that machine >> >> >>On Sat, Jan 12, 2013 at 12:55 AM, springring wrote: >> >>> hi, >>> >>> I modify the file as below, there is still error >>> >>> 1 #!/bin/env python >>> 2 >>> 3 import sys >>> 4 >>> 5 for line in sys.stdin: >>> 6 offset,filename = line.split("\t") >>> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename >>> 8 print line >>> 9 print filename >>> 10 print file >>> 11 file_obj = open(file) >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> At 2013-01-12 16:34:37,"Nitin Pawar" wrote: >>> >is this correct path for writing onto hdfs? >>> > >>> >"hdfs://user/hdfs/catalog3." >>> > >>> >I don't see the namenode info in the path. Can this cause any issue. Just >>> >making an guess >>> >something like hdfs://host:port/path >>> > >>> >On Sat, Jan 12, 2013 at 12:30 AM, springring wrote: >>> > >>> >> hdfs://user/hdfs/catalog3/ >>> > >>> > >>> > >>> > >>> > >>> >-- >>> >Nitin Pawar >>> >> >> >> >>-- >>Nitin Pawar
Hive utf8
Hi, I put some file include chinese into HDFS. And read the file as "hadoop fs -cat /user/hive/warehouse/..." , is ok, I can see the chinese. But when I open the table in hive, I can't read chinese(english is ok )why?
WholeFileInputFormat with streaming
Hi, I want to use: hadoop jar /hadoop-streaming-0.20.2-cdh3u3.jar -inputformat org.apache.hadoop.streaming.WholeFileInputFormat so, I download code from : https://github.com/tomwhite/hadoop-book/tree/master/ch07/src/main/java WholeFileInputFormat.java WholeFileRecordReader.java and package the java file with : package org.apache.hadoop.streaming; solution A: copy WholeFileInputFormat.java , WholeFileRecordReader.java to hadoop-0.20.2-cdh3u3/src/contrib/streaming/src/java/org/apache/hadoop/streaming/ then javac -classpath /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/lib/* -d WFInputFormatClassNew hadoop-0.20.2-cdh3u3/src/contrib/streaming/src/java/org/apache/hadoop/streaming/*.java there is a lot of error solution B: compile the java file WholeFileInputFormat.java , WholeFileRecordReader.java: javac -classpath /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar:/usr/lib/hadoop-0.20/*:/usr/lib/hadoop-0.20/lib/* -d WFInputFormatClass copy /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar to then : jar uf hadoop-streaming-0.20.2-cdh3u3.jar WFInputFormatClass/org/apache/hadoop/streaming/WholeFileRecordReader.class jar uf hadoop-streaming-0.20.2-cdh3u3.jar WFInputFormatClass/org/apache/hadoop/streaming/WholeFileInputFormat.class there is no error, but when I run: hadoop jar /hadoop-streaming-0.20.2-cdh3u3.jar -inputformat org.apache.hadoop.streaming.WholeFileInputFormat ... there is error: -inputformat : class not found : org.apache.hadoop.streaming.WholeFileInputFormat what's wrong with the two solution? or is there any new solution? thx. Ring
how to define new InputFormat with streaming?
Hi, my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new InputFormat in hadoop book , but there is error "class org.apache.hadoop.streaming.WholeFileInputFormat not org.apache.hadoop.mapred.InputFormat" Hadoop version is 0.20, but the streaming still depend on 0.10 mapred api? the detail: * javac -classpath /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class7 ./*.java cd class7 jar uf /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar org/apache/hadoop/streaming/*.class hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -inputformat WholeFileInputFormat -mapper xmlmappertest.py -file xmlmappertest.py -input /user/hdfs/tarcatalog -output /user/hive/external/catalog -jobconf mapred.map.tasks=108 13/03/15 16:27:51 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.streaming.WholeFileInputFormat not org.apache.hadoop.mapred.InputFormat at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1070) at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:609) at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:707) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:122) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) *the code from hadoop book*** WholeFileInputFormat.java // cc WholeFileInputFormat An InputFormat for reading a whole file as a record importjava.io.IOException; importorg.apache.hadoop.fs.*; importorg.apache.hadoop.io.*; importorg.apache.hadoop.mapreduce.InputSplit; importorg.apache.hadoop.mapreduce.JobContext; importorg.apache.hadoop.mapreduce.RecordReader; importorg.apache.hadoop.mapreduce.TaskAttemptContext; importorg.apache.hadoop.mapreduce.lib.input.*; //vv WholeFileInputFormat publicclassWholeFileInputFormat extendsFileInputFormat{ @Override protectedbooleanisSplitable(JobContextcontext,Pathfile){ returnfalse; } @Override publicRecordReadercreateRecordReader( InputSplitsplit,TaskAttemptContextcontext)throwsIOException, InterruptedException{ WholeFileRecordReaderreader=newWholeFileRecordReader(); reader.initialize(split,context); returnreader; } } //^^ WholeFileInputFormat WholeFileRecordReader.java // cc WholeFileRecordReader The RecordReader used by WholeFileInputFormat for reading a whole file as a record importjava.io.IOException; importorg.apache.hadoop.conf.Configuration; importorg.apache.hadoop.fs.FSDataInputStream; importorg.apache.hadoop.fs.FileSystem; importorg.apache.hadoop.fs.Path; importorg.apache.hadoop.io.BytesWritable; importorg.apache.hadoop.io.IOUtils; importorg.apache.hadoop.io.NullWritable; importorg.apache.hadoop.mapreduce.InputSplit; importorg.apache.hadoop.mapreduce.RecordReader; importorg.apache.hadoop.mapreduce.TaskAttemptContext; importorg.apache.hadoop.mapreduce.lib.input.FileSplit; //vv WholeFileRecordReader classWholeFileRecordReaderextendsRecordReader{ privateFileSplitfileSplit; privateConfigurationconf; privateBytesWritablevalue=newBytesWritable(); privatebooleanprocessed=false; @Override publicvoidinitialize(InputSplitsplit,TaskAttemptContextcontext) throwsIOException,InterruptedException{ this.fileSplit=(FileSplit)split; this.conf=context.getConfiguration(); } @Override publicbooleannextKeyValue()throwsIOException,InterruptedException{ if(!processed){ byte[]contents=newbyte[(int)fileSplit.getLength()]; Pathfile=fileSplit.getPath(); FileSystemfs=file.getFileSystem(conf); FSDataInputStreamin=null; try{ in=fs.open(file); IOUtils.readFully(in,contents,0,contents.length); value.set(contents,0,contents.length); }finally{ IOUtils.closeStream(in); } processed=true; returntrue; } returnfalse; } @Override publicNullWritablegetCurrentKey()throwsIOException,InterruptedException{ returnNullWritable.get(); } @Override publicBytesWritableget
Re:Re: how to define new InputFormat with streaming?
thanks I modify the java file with old "mapred" API, but there is still error javac -classpath /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 ./*.java ./WholeFileInputFormat.java:16: error: package org.apache.hadoop.mapred.lib.input does not exist import org.apache.hadoop.mapred.lib.input.*; does it because hadoop-0.20.2-cdh3u3 not include "mapred" API? At 2013-03-17 14:22:43,"Harsh J" wrote: >The issue is that Streaming expects the old/stable MR API >(org.apache.hadoop.mapred.InputFormat) as its input format class, but your >WholeFileInputFormat is using the new MR API >(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form >will let you pass. > >This has nothing to do with your version/distribution of Hadoop. > > >On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran wrote: > >> On 15 March 2013 09:18, springring wrote: >> >> > Hi, >> > >> > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new >> > InputFormat in hadoop book , but there is error >> > "class org.apache.hadoop.streaming.WholeFileInputFormat not >> > org.apache.hadoop.mapred.InputFormat" >> > >> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred >> api? >> > >> >> >> 1. please don't spam all the lists >> 2. grab a later version of the apache releases if you want help on them on >> these mailing lists, or go to the cloudera lists, where they will probably >> say "upgrade to CDH 4.x" before asking questions. >> >> thanks >> > > > >-- >Harsh J
Re:Re: Re: how to define new InputFormat with streaming?
you are right! Now the import path is all right. At 2013-03-18 09:57:33,"Harsh J" wrote: >It isn't as easy as changing that import line: > >> package org.apache.hadoop.mapred.lib.input does not exist > >The right package is package org.apache.hadoop.mapred. > >On Mon, Mar 18, 2013 at 7:22 AM, springring wrote: >> thanks >> I modify the java file with old "mapred" API, but there is still error >> >> javac -classpath >> /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d >> class9 ./*.java >> ./WholeFileInputFormat.java:16: error: package >> org.apache.hadoop.mapred.lib.input does not exist >> import org.apache.hadoop.mapred.lib.input.*; >> >> does it because hadoop-0.20.2-cdh3u3 not include "mapred" API? >> >> >> >> >> >> >> At 2013-03-17 14:22:43,"Harsh J" wrote: >>>The issue is that Streaming expects the old/stable MR API >>>(org.apache.hadoop.mapred.InputFormat) as its input format class, but your >>>WholeFileInputFormat is using the new MR API >>>(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form >>>will let you pass. >>> >>>This has nothing to do with your version/distribution of Hadoop. >>> >>> >>>On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran >>>wrote: >>> >>>> On 15 March 2013 09:18, springring wrote: >>>> >>>> > Hi, >>>> > >>>> > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new >>>> > InputFormat in hadoop book , but there is error >>>> > "class org.apache.hadoop.streaming.WholeFileInputFormat not >>>> > org.apache.hadoop.mapred.InputFormat" >>>> > >>>> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred >>>> api? >>>> > >>>> >>>> >>>> 1. please don't spam all the lists >>>> 2. grab a later version of the apache releases if you want help on them on >>>> these mailing lists, or go to the cloudera lists, where they will probably >>>> say "upgrade to CDH 4.x" before asking questions. >>>> >>>> thanks >>>> >>> >>> >>> >>>-- >>>Harsh J > > > >-- >Harsh J
is that a mistake in Hadoop Tutorial?
Hi, as the red color word in attached file page7. i think it should be "combine" instead of "map", or it's my miscommunication? br Springring.Xu
Re: is that a mistake in Hadoop Tutorial?
i mean like below WordCount also specifies a combiner (line 46). Hence, the output of each map is passed through the local combiner (which is same as the Reducer as per the job configuration) for local aggregation, after being sorted on the keys. The output of the first map: ( why not combine) < Bye, 1> < Hello, 1> < World, 2> The output of the second map: ( why not combine) < Goodbye, 1> < Hadoop, 2> - Original Message - From: "springring" To: ; Sent: Sunday, September 27, 2009 10:11 AM Subject: is that a mistake in Hadoop Tutorial? > Hi, >as the red color word in attached file page7. i think it should be > "combine" instead of "map", > or it's my miscommunication? > > br > > Springring.Xu
Bloksmap.map
Hi, I've been puzzling . hadoop-0.17.1\src\java\org\apache\hadoop\dfs\Blocksmap.java line 291 private Map map = new HashMap(); why is not -- private Map map = new HashMap(); i think if Map, the key will be BlockInfo, mean {INodeFile,Datanode}, and it can be reduced by INodeFile or/and Datanode. Springring.Xu
[paper]deconstruct Hadoop Distributed File System
Hi All, The attached file is my paper about deconstruct Hadoop Distributed File System and Cluster triplet-Space Model. Look forward your comments or suggestions. Springring.Xu
Re: [paper]deconstruct Hadoop Distributed File System
sorry~ perhaps the *.pdf file processed as spam? so, try again as attached .rar file. - Original Message - From: "springring" To: Sent: Sunday, February 21, 2010 12:52 PM Subject: [paper]deconstruct Hadoop Distributed File System > Hi All, >The attached file is my paper about deconstruct Hadoop Distributed File > System and Cluster triplet-Space Model. > Look forward your comments or suggestions. > > > > Springring.Xu >
[help!] [paper]deconstruct Hadoop Distributed File System
一 一|| o~~ fail again is there limit of size or mail address? my mail is springr...@126.com Anyone take an interest in the subject,mail to me as above address please. you are welcome~ in addition, can administrator help me to send the attached file. thks. - Original Message - From: "springring" To: Sent: Sunday, February 21, 2010 4:50 PM Subject: Re: [paper]deconstruct Hadoop Distributed File System > sorry~ > > perhaps the *.pdf file processed as spam? so, try again as attached .rar > file. > > > - Original Message - > From: "springring" > To: > Sent: Sunday, February 21, 2010 12:52 PM > Subject: [paper]deconstruct Hadoop Distributed File System > > >> Hi All, >>The attached file is my paper about deconstruct Hadoop Distributed File >> System and Cluster triplet-Space Model. >> Look forward your comments or suggestions. >> >> >> >> Springring.Xu >>
Re: [help!] [paper]deconstruct Hadoop Distributed File System
I have upload the paper to gmail mail.google.com login ID: hadoopcn password: mapreduce - Original Message - From: "springring" To: Sent: Sunday, February 21, 2010 5:31 PM Subject: [help!] [paper]deconstruct Hadoop Distributed File System >一 一|| > > o~~ fail again > > is there limit of size or mail address? > my mail is springr...@126.com > > Anyone take an interest in the subject,mail to me as above address please. > you are welcome~ > > in addition, can administrator help me to send the attached file. thks. > > > - Original Message - > From: "springring" > To: > Sent: Sunday, February 21, 2010 4:50 PM > Subject: Re: [paper]deconstruct Hadoop Distributed File System > > >> sorry~ >> >> perhaps the *.pdf file processed as spam? so, try again as attached .rar >> file. >> >> >> - Original Message - >> From: "springring" >> To: >> Sent: Sunday, February 21, 2010 12:52 PM >> Subject: [paper]deconstruct Hadoop Distributed File System >> >> >>> Hi All, >>>The attached file is my paper about deconstruct Hadoop Distributed File >>> System and Cluster triplet-Space Model. >>> Look forward your comments or suggestions. >>> >>> >>> >>> Springring.Xu >>>
[paper] deconstruct HDFS
Hi All, I have upload the english version of the paper to gmail which about "Cluster triplet-Space Model by deconstruct Hadoop Distributed File System" Although it take me long time to translate paper from chinese to english, but there are still "chinglish" ^ ^. Any way i wish new version will be helpful to our communication. Any suggestion or question are welcome, either to the subject or grammar. Springring.Xu Download from: mail.google.com login ID: hadoopcn password: mapreduce > - Original Message ----- > From: "springring" > To: > Sent: Sunday, February 21, 2010 6:37 PM > Subject: Re: [help!] [paper]deconstruct Hadoop Distributed File System > > >>I have upload the paper to gmail >> >> mail.google.com >> >> login ID: hadoopcn >> password: mapreduce >> >> >> - Original Message - >> From: "springring" >> To: >> Sent: Sunday, February 21, 2010 5:31 PM >> Subject: [help!] [paper]deconstruct Hadoop Distributed File System >> >> >>>一 一|| >>> >>> o~~ fail again >>> >>> is there limit of size or mail address? >>> my mail is springr...@126.com >>> >>> Anyone take an interest in the subject,mail to me as above address please. >>> you are welcome~ >>> >>> in addition, can administrator help me to send the attached file. thks. >>> >>> >>> - Original Message - >>> From: "springring" >>> To: >>> Sent: Sunday, February 21, 2010 4:50 PM >>> Subject: Re: [paper]deconstruct Hadoop Distributed File System >>> >>> >>>> sorry~ >>>> >>>> perhaps the *.pdf file processed as spam? so, try again as attached .rar >>>> file. >>>> >>>> >>>> - Original Message - >>>> From: "springring" >>>> To: >>>> Sent: Sunday, February 21, 2010 12:52 PM >>>> Subject: [paper]deconstruct Hadoop Distributed File System >>>> >>>> >>>>> Hi All, >>>>>The attached file is my paper about deconstruct Hadoop Distributed >>>>> File System and Cluster triplet-Space Model. >>>>> Look forward your comments or suggestions. >>>>> >>>>> >>>>> >>>>> Springring.Xu >>>>>
Re: [paper] deconstruct HDFS
Konstantin, thanks. I have upload the paper to Google docs, two links as below, one is Chinese version another is English version. Moreover paper in Gmail is upload too, former virsion have something wrong with pdf, so the Figures is out of shape. Chinese: http://docs.google.com/fileview?id=0B8EK5k9okfTZYzIzMWMxNWMtMjA1OS00ZWMzLWE0OWMtNjMwMDU3OTQ2OWUw&hl=en&invite=CPDe7NcO English: http://docs.google.com/fileview?id=0B8EK5k9okfTZN2Q1ZWQ3ZjMtY2RlMC00ZDJiLWJjM2ItZmJjZjYwMmFlMjRj&hl=en&invite=CMzh3-0P - Original Message - From: "Konstantin Boudnik" To: Sent: Saturday, February 27, 2010 2:29 AM Subject: Re: [paper] deconstruct HDFS > Somewhat changing the topic: email isn't really a good way of storing > documents. > > Just to make your life easier and since you're using google services anyway > you might consider putting this paper to Google Docs and share it with > everyone - that'd be an easier that 'email sharing' :) > > On Fri, Feb 26, 2010 at 02:37AM, springring wrote: >> Hi All, >> I have upload the english version of the paper to gmail which about >> "Cluster triplet-Space Model by deconstruct Hadoop Distributed File System" >> Although it take me long time to translate paper from chinese to >> english, but there are still "chinglish" ^ ^. Any way i wish new >> version will be helpful to our communication. >> Any suggestion or question are welcome, either to the subject >> or grammar. >> >> Springring.Xu >> >> Download from: mail.google.com >> >> login ID: hadoopcn >> password: mapreduce >> >> >> >> > - Original Message - >> > From: "springring" >> > To: >> > Sent: Sunday, February 21, 2010 6:37 PM >> > Subject: Re: [help!] [paper]deconstruct Hadoop Distributed File System >> > >> > >> >>I have upload the paper to gmail >> >> >> >> mail.google.com >> >> >> >> login ID: hadoopcn >> >> password: mapreduce >> >> >> >> >> >> - Original Message - >> >> From: "springring" >> >> To: >> >> Sent: Sunday, February 21, 2010 5:31 PM >> >> Subject: [help!] [paper]deconstruct Hadoop Distributed File System >> >> >> >> >> >>>??? ???|| >> >>> >> >>> o~~ fail again >> >>> >> >>> is there limit of size or mail address? >> >>> my mail is springr...@126.com >> >>> >> >>> Anyone take an interest in the subject,mail to me as above address >> >>> please. >> >>> you are welcome~ >> >>> >> >>> in addition, can administrator help me to send the attached file. thks. >> >>> >> >>> >> >>> - Original Message - >> >>> From: "springring" >> >>> To: >> >>> Sent: Sunday, February 21, 2010 4:50 PM >> >>> Subject: Re: [paper]deconstruct Hadoop Distributed File System >> >>> >> >>> >> >>>> sorry~ >> >>>> >> >>>> perhaps the *.pdf file processed as spam? so, try again as attached >> >>>> .rar file. >> >>>> >> >>>> >> >>>> - Original Message - >> >>>> From: "springring" >> >>>> To: >> >>>> Sent: Sunday, February 21, 2010 12:52 PM >> >>>> Subject: [paper]deconstruct Hadoop Distributed File System >> >>>> >> >>>> >> >>>>> Hi All, >> >>>>>The attached file is my paper about deconstruct Hadoop Distributed >> >>>>> File System and Cluster triplet-Space Model. >> >>>>> Look forward your comments or suggestions. >> >>>>> >> >>>>> >> >>>>> >> >>>>> Springring.Xu >> >>>>>
Re: Namespace partitioning using Locality Sensitive Hashing
I have a question. Now that hadoop have "maper" and "reducer" , how about the solution like Map and Reduce or that directly Reduce here nodepair can be look as a branch... - Original Message - From: "Konstantin Shvachko" To: Sent: Tuesday, March 02, 2010 10:21 AM Subject: Re: Namespace partitioning using Locality Sensitive Hashing > Symlinks is a brand new feature in HDFS. > You can read about it in > https://issues.apache.org/jira/browse/HDFS-245 > Documentation is here: > https://issues.apache.org/jira/secure/attachment/12434745/design-doc-v4.txt > > Symbolic links in HDFS can point to a directory in a different file system, > particularly on a different HDSF cluster. They are like mount points in this > case. > So you can create a symlink on cluster C1 pointing to the root of cluster C2. > This makes C2 a sub-namespace of C1. > > --Konstantin > > On 3/1/2010 5:42 PM, Ketan Dixit wrote: >> Hello, >> Thank you Konstantin and Allen for your reply. The information >> provided really helped to improve my understanding. >> However I still have few questions. >> How Symlinks/ soft links are used to solve the probem of partitioning. >> (Where do the symlinks point to? All the mapping is >> stored in memory but symlinks point to file objects? This is little >> confusing to me) >> Can you please provide insight into this? >> >> Thanks, >> Ketan >> >> On Mon, Mar 1, 2010 at 3:26 PM, Konstantin Shvachko >> wrote: >>> >>> Hi Ketan, >>> >>> AFAIU, hashing is used to map files and directories into different >>> name-nodes. >>> Suppose you use a simple hash function on a file path h(path), and that >>> files >>> with the same hash value (or within a hash range) are mapped to the same >>> name-node. >>> Then files with the same parent will be randomly mapped into different >>> name-nodes: Pr(h(/dir/file1) = h(/dir/file2)) - is small. >>> >>> The ides with LSH is to add some locality factor in the hash function in >>> order >>> to increase probability of placing files from the same directory (or a >>> subtree) >>> into the same name-node. >>> >>> Example 1. >>> Suppose that you apply MD5 only to the path to the parent rather >>> then to the entire file path: h(/root/dir/file) = MD5(/root/dir) >>> Then all files of the same directory will have the same hash value and >>> therefore >>> will be mapped into the same name-node. >>> >>> Example 2. >>> If a path consists of path components pi, where i = 1,..,n >>> Lets define the following hash function: >>> h(/p1/.../pn) = 0, if n< 7 >>> h(/p1/.../pn) = MD5(/p1/.../p7), if n>= 7 >>> With this hash function each subtree rooted at level 7 of the namepsace >>> hierarchy >>> will be entirely mapped to the same name-node. >>> >>> There could be more elaborate examples. >>> >>> Symlinks do provide a way to partition the namespace, as Allen points out, >>> although this is a static partitioning. Static partitions as opposed to >>> dynamic >>> ones do not guarantee that the partitions will be "equal" in size, where >>> "size" >>> may have different meanings (like number of files, or space occupied by the >>> files, or number of blocks). >>> A good hash function need to conform to some equal partitioning requirement. >>> Function from Example 2 would be considered bad in this sense, while >>> Example 1 >>> defines a good function. >>> >>> This is my take on the problem. Hope it makes sense, >>> --Konstantin >>> >>> >>> On 3/1/2010 8:48 AM, Ketan Dixit wrote: Hi, I am a graduate student in Computer Science department at SUNY Stony Brook. I am thinking of doing a project on Hadoop for my course "Cloud Computing" conducted by Prof. Radu Sion. While going through the links of the "Yahoo open source projects for students" page I found the idea "Research on new hashing schemes for filesystem namespace partitioning" interesting. It looks to me the idea is to assign subtree of the whole namespace to one namenode and another subtree to another namenode. How LSH is better than normal hashing? Because still, a client or a fixed namenode has to take decision of which namenode to contact in whatever hashing ? It looks to me that requests to files under same subtree are directed to the same namenode then the performance will be faster as the requests to the same namenode are clustered around the a part of namespace subtree (For example a part of on which client is doing some operation.) Is this assumption correct? Can I have more insight in this regard. Thanks, Ketan >>> >> >
web access interface for HDFS
hi all, I want to making sure one thing --if there are web page in HDFS to access files? I know that there are command like "fs -put" and "fs -get",even more we can download file from web like "slave:50075".But is there a way to put file in HDFS through web? Additional , is there function to authentication web user and limmit the space size they can use? Thks. Springring.Xu
question about CDH3
Hi, I install CDH3 follow the mannul as attached file, but when I run the command "su -s /bin/bash -hdfs -c 'hadoop namenode -format'" on page 25, it show that "su: invalid option --h", so I change the comand to "su -s /bin/bash -hdfs -c'hadoop namenode -format'" the message is that "May not run daemons as root.Please specify HADOOP_NAMENODE_USER" So, is there any wrong in my operation? Thanks. Springring.Xu
how to create a group in hdfs
Hi, how to create a user group in hdfs? hadoop fs -? Ring
Is there "useradd" in Hadoop
Hi, There are "chmod"、"chown"、"chgrp" in HDFS, is there some command like "useradd -g" to add a user in a group,? Even more, is there "hadoop's group", not "linux's group"? Ring
Re: how to create a group in hdfs
Segel, I got it, and sorry I just send this mail by "answer all" another mail, and forget that include hbase. Thanks. Ring - Original Message - From: "Segel, Mike" To: ; "Ryan Rawson" Cc: ; Sent: Wednesday, March 23, 2011 10:32 PM Subject: RE: how to create a group in hdfs Not sure why this has anything to do with hbase... The short answer... Outside of the supergroup which is controlled by dfs.permissions.supergroup, Hadoop apparently checks to see if the owner is a member of the group you want to use. This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there. So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those. HTH -Mike -----Original Message- From: springring [mailto:springr...@126.com] Sent: Tuesday, March 22, 2011 11:29 PM To: common-dev@hadoop.apache.org; Ryan Rawson Cc: u...@hbase.apache.org; common-u...@hadoop.apache.org; common-dev@hadoop.apache.org Subject: how to create a group in hdfs Hi, how to create a user group in hdfs? hadoop fs -? Ring The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.