python streaming error

2013-01-12 Thread springring
Hi,

 When I run code below as a streaming, the job error N/A and killed.  I run 
step by step, find it error when
" file_obj = open(file) " .  When I run same code outside of hadoop, everything 
is ok.

  1 #!/bin/env python
  2
  3 import sys
  4
  5 for line in sys.stdin:
  6 offset,filename = line.split("\t")
  7 file = "hdfs://user/hdfs/catalog3/" + filename
  8 print line
  9 print filename
 10 print file
 11 file_obj = open(file)
..



Re:Re: python streaming error

2013-01-12 Thread springring
hi,

I modify the file as below, there is still error

  1 #!/bin/env python
  2
  3 import sys
  4
  5 for line in sys.stdin:
  6 offset,filename = line.split("\t")
  7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
  8 print line
  9 print filename
 10 print file
 11 file_obj = open(file)









At 2013-01-12 16:34:37,"Nitin Pawar"  wrote:
>is this correct path for writing onto hdfs?
>
>"hdfs://user/hdfs/catalog3."
>
>I don't see the namenode info in the path. Can this cause any issue. Just
>making an guess
>something like hdfs://host:port/path
>
>On Sat, Jan 12, 2013 at 12:30 AM, springring  wrote:
>
>> hdfs://user/hdfs/catalog3/
>
>
>
>
>
>-- 
>Nitin Pawar


Re:Re: Re: python streaming error

2013-01-13 Thread springring
hi,
 I find the key point, not the hostname, it is right.
just chang "offset,filename = line.split("\t")" to
"offset,filename = line.strip().split("\t")"
now it pass







At 2013-01-12 16:58:29,"Nitin Pawar"  wrote:
>computedb-13 is not a valid host name
>
>may be if you have local hadoop then you can name refer it with
>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
>
>if its on other machine then just try with IP address of that machine
>
>
>On Sat, Jan 12, 2013 at 12:55 AM, springring  wrote:
>
>> hi,
>>
>> I modify the file as below, there is still error
>>
>>   1 #!/bin/env python
>>   2
>>   3 import sys
>>   4
>>   5 for line in sys.stdin:
>>   6 offset,filename = line.split("\t")
>>   7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>>   8 print line
>>   9 print filename
>>  10 print file
>>  11 file_obj = open(file)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2013-01-12 16:34:37,"Nitin Pawar"  wrote:
>> >is this correct path for writing onto hdfs?
>> >
>> >"hdfs://user/hdfs/catalog3."
>> >
>> >I don't see the namenode info in the path. Can this cause any issue. Just
>> >making an guess
>> >something like hdfs://host:port/path
>> >
>> >On Sat, Jan 12, 2013 at 12:30 AM, springring  wrote:
>> >
>> >> hdfs://user/hdfs/catalog3/
>> >
>> >
>> >
>> >
>> >
>> >--
>> >Nitin Pawar
>>
>
>
>
>-- 
>Nitin Pawar


Re:Re:Re: Re: python streaming error

2013-01-13 Thread springring
sorry
the error keep on, even when i modify the code

"offset,filename = line.strip().split("\t")"








At 2013-01-14 09:27:10,springring  wrote:
>hi,
> I find the key point, not the hostname, it is right.
>just chang "offset,filename = line.split("\t")" to
>"offset,filename = line.strip().split("\t")"
>now it pass
>
>
>
>
>
>
>
>At 2013-01-12 16:58:29,"Nitin Pawar"  wrote:
>>computedb-13 is not a valid host name
>>
>>may be if you have local hadoop then you can name refer it with
>>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
>>
>>if its on other machine then just try with IP address of that machine
>>
>>
>>On Sat, Jan 12, 2013 at 12:55 AM, springring  wrote:
>>
>>> hi,
>>>
>>> I modify the file as below, there is still error
>>>
>>>   1 #!/bin/env python
>>>   2
>>>   3 import sys
>>>   4
>>>   5 for line in sys.stdin:
>>>   6 offset,filename = line.split("\t")
>>>   7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>>>   8 print line
>>>   9 print filename
>>>  10 print file
>>>  11 file_obj = open(file)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2013-01-12 16:34:37,"Nitin Pawar"  wrote:
>>> >is this correct path for writing onto hdfs?
>>> >
>>> >"hdfs://user/hdfs/catalog3."
>>> >
>>> >I don't see the namenode info in the path. Can this cause any issue. Just
>>> >making an guess
>>> >something like hdfs://host:port/path
>>> >
>>> >On Sat, Jan 12, 2013 at 12:30 AM, springring  wrote:
>>> >
>>> >> hdfs://user/hdfs/catalog3/
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >Nitin Pawar
>>>
>>
>>
>>
>>-- 
>>Nitin Pawar


Hive utf8

2013-01-15 Thread springring
Hi,

   I put some file include chinese into HDFS.
   And read the file as "hadoop fs -cat /user/hive/warehouse/..." ,  is ok, I 
can see the chinese.

  But when I open the table in hive, I can't read chinese(english is ok 
)why?


WholeFileInputFormat with streaming

2013-03-02 Thread springring
Hi,

I want to use:
hadoop jar /hadoop-streaming-0.20.2-cdh3u3.jar -inputformat 
org.apache.hadoop.streaming.WholeFileInputFormat

so, I download code from :  
https://github.com/tomwhite/hadoop-book/tree/master/ch07/src/main/java
WholeFileInputFormat.java
WholeFileRecordReader.java

and package the java file with :
package org.apache.hadoop.streaming;

solution A:
copy WholeFileInputFormat.java , WholeFileRecordReader.java  to  
hadoop-0.20.2-cdh3u3/src/contrib/streaming/src/java/org/apache/hadoop/streaming/
then
javac -classpath 
/usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/lib/*
 -d WFInputFormatClassNew 
hadoop-0.20.2-cdh3u3/src/contrib/streaming/src/java/org/apache/hadoop/streaming/*.java

there is a lot of error

solution B:
compile the java file   WholeFileInputFormat.java , WholeFileRecordReader.java:
javac -classpath 
/usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u3-core.jar:/usr/lib/hadoop-0.20/*:/usr/lib/hadoop-0.20/lib/*
 -d WFInputFormatClass 
copy /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar   to 

then :
jar uf hadoop-streaming-0.20.2-cdh3u3.jar 
WFInputFormatClass/org/apache/hadoop/streaming/WholeFileRecordReader.class
jar uf hadoop-streaming-0.20.2-cdh3u3.jar 
WFInputFormatClass/org/apache/hadoop/streaming/WholeFileInputFormat.class
there is no error, but when I run:
hadoop jar /hadoop-streaming-0.20.2-cdh3u3.jar -inputformat 
org.apache.hadoop.streaming.WholeFileInputFormat ...
there is error:
-inputformat : class not found : 
org.apache.hadoop.streaming.WholeFileInputFormat

what's wrong with the two solution? or is there any new solution?

thx.

Ring


how to define new InputFormat with streaming?

2013-03-15 Thread springring
 Hi,

 my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new 
InputFormat in hadoop book , but there is error
"class org.apache.hadoop.streaming.WholeFileInputFormat not 
org.apache.hadoop.mapred.InputFormat"

Hadoop version is 0.20, but the streaming still depend on 0.10 mapred api?

the detail:
*
javac -classpath 
/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class7 
./*.java
cd class7
jar uf 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
org/apache/hadoop/streaming/*.class

 hadoop jar 
/usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
-inputformat WholeFileInputFormat -mapper xmlmappertest.py -file 
xmlmappertest.py -input /user/hdfs/tarcatalog -output 
/user/hive/external/catalog -jobconf mapred.map.tasks=108
13/03/15 16:27:51 WARN streaming.StreamJob: -jobconf option is deprecated, 
please use -D instead.
Exception in thread "main" java.lang.RuntimeException: class 
org.apache.hadoop.streaming.WholeFileInputFormat not 
org.apache.hadoop.mapred.InputFormat
at 
org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1070)
at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:609)
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:707)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:122)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
*the code from 
hadoop book***

WholeFileInputFormat.java
// cc WholeFileInputFormat An InputFormat for reading a whole file as a record
importjava.io.IOException;
importorg.apache.hadoop.fs.*;
importorg.apache.hadoop.io.*;
importorg.apache.hadoop.mapreduce.InputSplit;
importorg.apache.hadoop.mapreduce.JobContext;
importorg.apache.hadoop.mapreduce.RecordReader;
importorg.apache.hadoop.mapreduce.TaskAttemptContext;
importorg.apache.hadoop.mapreduce.lib.input.*;


//vv WholeFileInputFormat
publicclassWholeFileInputFormat
extendsFileInputFormat{
  
  @Override
  protectedbooleanisSplitable(JobContextcontext,Pathfile){
returnfalse;
  }


  @Override
  publicRecordReadercreateRecordReader(
  InputSplitsplit,TaskAttemptContextcontext)throwsIOException,
  InterruptedException{
WholeFileRecordReaderreader=newWholeFileRecordReader();
reader.initialize(split,context);
returnreader;
  }
}
//^^ WholeFileInputFormat


WholeFileRecordReader.java

// cc WholeFileRecordReader The RecordReader used by WholeFileInputFormat for 
reading a whole file as a record
importjava.io.IOException;


importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.fs.FSDataInputStream;
importorg.apache.hadoop.fs.FileSystem;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.BytesWritable;
importorg.apache.hadoop.io.IOUtils;
importorg.apache.hadoop.io.NullWritable;
importorg.apache.hadoop.mapreduce.InputSplit;
importorg.apache.hadoop.mapreduce.RecordReader;
importorg.apache.hadoop.mapreduce.TaskAttemptContext;
importorg.apache.hadoop.mapreduce.lib.input.FileSplit;


//vv WholeFileRecordReader
classWholeFileRecordReaderextendsRecordReader{
  
  privateFileSplitfileSplit;
  privateConfigurationconf;
  privateBytesWritablevalue=newBytesWritable();
  privatebooleanprocessed=false;


  @Override
  publicvoidinitialize(InputSplitsplit,TaskAttemptContextcontext)
  throwsIOException,InterruptedException{
this.fileSplit=(FileSplit)split;
this.conf=context.getConfiguration();
  }
  
  @Override
  publicbooleannextKeyValue()throwsIOException,InterruptedException{
if(!processed){
  byte[]contents=newbyte[(int)fileSplit.getLength()];
  Pathfile=fileSplit.getPath();
  FileSystemfs=file.getFileSystem(conf);
  FSDataInputStreamin=null;
  try{
in=fs.open(file);
IOUtils.readFully(in,contents,0,contents.length);
value.set(contents,0,contents.length);
  }finally{
IOUtils.closeStream(in);
  }
  processed=true;
  returntrue;
}
returnfalse;
  }
  
  @Override
  publicNullWritablegetCurrentKey()throwsIOException,InterruptedException{
returnNullWritable.get();
  }


  @Override
  publicBytesWritableget

Re:Re: how to define new InputFormat with streaming?

2013-03-17 Thread springring
thanks
I modify the java file with old "mapred" API, but there is still error

 javac -classpath 
/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 
./*.java
./WholeFileInputFormat.java:16: error: package 
org.apache.hadoop.mapred.lib.input does not exist
import org.apache.hadoop.mapred.lib.input.*;

does it because hadoop-0.20.2-cdh3u3 not include "mapred" API?






At 2013-03-17 14:22:43,"Harsh J"  wrote:
>The issue is that Streaming expects the old/stable MR API
>(org.apache.hadoop.mapred.InputFormat) as its input format class, but your
>WholeFileInputFormat is using the new MR API
>(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form
>will let you pass.
>
>This has nothing to do with your version/distribution of Hadoop.
>
>
>On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran wrote:
>
>> On 15 March 2013 09:18, springring  wrote:
>>
>> >  Hi,
>> >
>> >  my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new
>> > InputFormat in hadoop book , but there is error
>> > "class org.apache.hadoop.streaming.WholeFileInputFormat not
>> > org.apache.hadoop.mapred.InputFormat"
>> >
>> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred
>> api?
>> >
>>
>>
>> 1. please don't spam all the lists
>> 2. grab a later version of the apache releases if you want help on them on
>> these mailing lists, or go to the cloudera lists, where they will probably
>> say "upgrade to CDH 4.x" before asking questions.
>>
>> thanks
>>
>
>
>
>-- 
>Harsh J


Re:Re: Re: how to define new InputFormat with streaming?

2013-03-17 Thread springring
you are right!

Now the import path is all right.








At 2013-03-18 09:57:33,"Harsh J"  wrote:
>It isn't as easy as changing that import line:
>
>> package org.apache.hadoop.mapred.lib.input does not exist
>
>The right package is package org.apache.hadoop.mapred.
>
>On Mon, Mar 18, 2013 at 7:22 AM, springring  wrote:
>> thanks
>> I modify the java file with old "mapred" API, but there is still error
>>
>>  javac -classpath 
>> /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d 
>> class9 ./*.java
>> ./WholeFileInputFormat.java:16: error: package 
>> org.apache.hadoop.mapred.lib.input does not exist
>> import org.apache.hadoop.mapred.lib.input.*;
>>
>> does it because hadoop-0.20.2-cdh3u3 not include "mapred" API?
>>
>>
>>
>>
>>
>>
>> At 2013-03-17 14:22:43,"Harsh J"  wrote:
>>>The issue is that Streaming expects the old/stable MR API
>>>(org.apache.hadoop.mapred.InputFormat) as its input format class, but your
>>>WholeFileInputFormat is using the new MR API
>>>(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form
>>>will let you pass.
>>>
>>>This has nothing to do with your version/distribution of Hadoop.
>>>
>>>
>>>On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran 
>>>wrote:
>>>
>>>> On 15 March 2013 09:18, springring  wrote:
>>>>
>>>> >  Hi,
>>>> >
>>>> >  my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new
>>>> > InputFormat in hadoop book , but there is error
>>>> > "class org.apache.hadoop.streaming.WholeFileInputFormat not
>>>> > org.apache.hadoop.mapred.InputFormat"
>>>> >
>>>> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred
>>>> api?
>>>> >
>>>>
>>>>
>>>> 1. please don't spam all the lists
>>>> 2. grab a later version of the apache releases if you want help on them on
>>>> these mailing lists, or go to the cloudera lists, where they will probably
>>>> say "upgrade to CDH 4.x" before asking questions.
>>>>
>>>> thanks
>>>>
>>>
>>>
>>>
>>>--
>>>Harsh J
>
>
>
>--
>Harsh J


is that a mistake in Hadoop Tutorial?

2009-09-26 Thread springring
Hi,
as the red color word in attached file page7. i think it should be 
"combine" instead of "map",
or it's my miscommunication?

br

Springring.Xu

Re: is that a mistake in Hadoop Tutorial?

2009-09-26 Thread springring
i mean like below



WordCount also specifies a combiner (line 46). Hence, the output of each map is 
passed through the local combiner (which is same as the Reducer as per the job 
configuration) for local aggregation, after being sorted on the keys.

The output of the first map: ( why not combine)
< Bye, 1> 
< Hello, 1> 
< World, 2> 

The output of the second map: ( why not combine)
< Goodbye, 1> 
< Hadoop, 2>


- Original Message - 
From: "springring" 
To: ; 
Sent: Sunday, September 27, 2009 10:11 AM
Subject: is that a mistake in Hadoop Tutorial?


> Hi,
>as the red color word in attached file page7. i think it should be 
> "combine" instead of "map",
> or it's my miscommunication?
> 
> br
> 
> Springring.Xu

Bloksmap.map

2009-09-27 Thread springring
Hi,
I've been puzzling .

hadoop-0.17.1\src\java\org\apache\hadoop\dfs\Blocksmap.java line 291

  private Map map = new HashMap();

why is not 
--  private Map map = new HashMap();

i think if Map, the key will be BlockInfo, mean 
{INodeFile,Datanode},
and it can be reduced by INodeFile or/and Datanode.

Springring.Xu



[paper]deconstruct Hadoop Distributed File System

2010-02-20 Thread springring
Hi All,
The attached file is my paper about deconstruct Hadoop Distributed File 
System and Cluster triplet-Space Model.
Look forward your comments or suggestions. 



Springring.Xu
 

Re: [paper]deconstruct Hadoop Distributed File System

2010-02-21 Thread springring
sorry~ 

perhaps the *.pdf file processed as spam?  so, try again as attached .rar file.


- Original Message - 
From: "springring" 
To: 
Sent: Sunday, February 21, 2010 12:52 PM
Subject: [paper]deconstruct Hadoop Distributed File System


> Hi All,
>The attached file is my paper about deconstruct Hadoop Distributed File 
> System and Cluster triplet-Space Model.
> Look forward your comments or suggestions. 
> 
> 
> 
> Springring.Xu
>

[help!] [paper]deconstruct Hadoop Distributed File System

2010-02-21 Thread springring
一 一||

o~~  fail again 

is there limit of size or mail address?  
my mail is springr...@126.com

Anyone take an interest in the subject,mail to me as above address please.
you are welcome~

in addition, can administrator help me to send the attached file. thks.


- Original Message - 
From: "springring" 
To: 
Sent: Sunday, February 21, 2010 4:50 PM
Subject: Re: [paper]deconstruct Hadoop Distributed File System


> sorry~ 
> 
> perhaps the *.pdf file processed as spam?  so, try again as attached .rar 
> file.
> 
> 
> - Original Message - 
> From: "springring" 
> To: 
> Sent: Sunday, February 21, 2010 12:52 PM
> Subject: [paper]deconstruct Hadoop Distributed File System
> 
> 
>> Hi All,
>>The attached file is my paper about deconstruct Hadoop Distributed File 
>> System and Cluster triplet-Space Model.
>> Look forward your comments or suggestions. 
>> 
>> 
>> 
>> Springring.Xu
>>

Re: [help!] [paper]deconstruct Hadoop Distributed File System

2010-02-21 Thread springring
I have upload the paper to gmail  

mail.google.com

login ID: hadoopcn
password:  mapreduce


- Original Message - 
From: "springring" 
To: 
Sent: Sunday, February 21, 2010 5:31 PM
Subject: [help!] [paper]deconstruct Hadoop Distributed File System


>一 一||
> 
> o~~  fail again 
> 
> is there limit of size or mail address?  
> my mail is springr...@126.com
> 
> Anyone take an interest in the subject,mail to me as above address please.
> you are welcome~
> 
> in addition, can administrator help me to send the attached file. thks.
> 
> 
> - Original Message - 
> From: "springring" 
> To: 
> Sent: Sunday, February 21, 2010 4:50 PM
> Subject: Re: [paper]deconstruct Hadoop Distributed File System
> 
> 
>> sorry~ 
>> 
>> perhaps the *.pdf file processed as spam?  so, try again as attached .rar 
>> file.
>> 
>> 
>> - Original Message - 
>> From: "springring" 
>> To: 
>> Sent: Sunday, February 21, 2010 12:52 PM
>> Subject: [paper]deconstruct Hadoop Distributed File System
>> 
>> 
>>> Hi All,
>>>The attached file is my paper about deconstruct Hadoop Distributed File 
>>> System and Cluster triplet-Space Model.
>>> Look forward your comments or suggestions. 
>>> 
>>> 
>>> 
>>> Springring.Xu
>>>

[paper] deconstruct HDFS

2010-02-26 Thread springring
Hi All,
I have upload the english version of the paper to gmail which about 
"Cluster triplet-Space Model by deconstruct Hadoop Distributed File System"
Although it take me long time to translate paper from chinese to
english, but there are still "chinglish"  ^ ^. Any way i wish new 
version will be helpful to our communication.
 Any suggestion or question are welcome, either to the subject 
or grammar.

Springring.Xu

Download from:   mail.google.com

login ID: hadoopcn
password:  mapreduce



> - Original Message ----- 
> From: "springring" 
> To: 
> Sent: Sunday, February 21, 2010 6:37 PM
> Subject: Re: [help!] [paper]deconstruct Hadoop Distributed File System
> 
> 
>>I have upload the paper to gmail  
>> 
>> mail.google.com
>> 
>> login ID: hadoopcn
>> password:  mapreduce
>> 
>> 
>> - Original Message - 
>> From: "springring" 
>> To: 
>> Sent: Sunday, February 21, 2010 5:31 PM
>> Subject: [help!] [paper]deconstruct Hadoop Distributed File System
>> 
>> 
>>>一 一||
>>> 
>>> o~~  fail again 
>>> 
>>> is there limit of size or mail address?  
>>> my mail is springr...@126.com
>>> 
>>> Anyone take an interest in the subject,mail to me as above address please.
>>> you are welcome~
>>> 
>>> in addition, can administrator help me to send the attached file. thks.
>>> 
>>> 
>>> - Original Message - 
>>> From: "springring" 
>>> To: 
>>> Sent: Sunday, February 21, 2010 4:50 PM
>>> Subject: Re: [paper]deconstruct Hadoop Distributed File System
>>> 
>>> 
>>>> sorry~ 
>>>> 
>>>> perhaps the *.pdf file processed as spam?  so, try again as attached .rar 
>>>> file.
>>>> 
>>>> 
>>>> - Original Message - 
>>>> From: "springring" 
>>>> To: 
>>>> Sent: Sunday, February 21, 2010 12:52 PM
>>>> Subject: [paper]deconstruct Hadoop Distributed File System
>>>> 
>>>> 
>>>>> Hi All,
>>>>>The attached file is my paper about deconstruct Hadoop Distributed 
>>>>> File System and Cluster triplet-Space Model.
>>>>> Look forward your comments or suggestions. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Springring.Xu
>>>>>

Re: [paper] deconstruct HDFS

2010-02-26 Thread springring
Konstantin,
thanks. 
I have upload the paper to Google docs, two links as below, one is 
Chinese version another is English 
version.
   Moreover paper in Gmail is upload too, former virsion have something 
wrong with pdf, so the Figures
 is out of shape. 

Chinese:
http://docs.google.com/fileview?id=0B8EK5k9okfTZYzIzMWMxNWMtMjA1OS00ZWMzLWE0OWMtNjMwMDU3OTQ2OWUw&hl=en&invite=CPDe7NcO

English:
http://docs.google.com/fileview?id=0B8EK5k9okfTZN2Q1ZWQ3ZjMtY2RlMC00ZDJiLWJjM2ItZmJjZjYwMmFlMjRj&hl=en&invite=CMzh3-0P


- Original Message - 
From: "Konstantin Boudnik" 
To: 
Sent: Saturday, February 27, 2010 2:29 AM
Subject: Re: [paper] deconstruct HDFS


> Somewhat changing the topic: email isn't really a good way of storing
> documents.
> 
> Just to make your life easier and since you're using google services anyway
> you might consider putting this paper to Google Docs and share it with
> everyone - that'd be an easier that 'email sharing' :)
> 
> On Fri, Feb 26, 2010 at 02:37AM, springring wrote:
>> Hi All,
>> I have upload the english version of the paper to gmail which about 
>> "Cluster triplet-Space Model by deconstruct Hadoop Distributed File System"
>> Although it take me long time to translate paper from chinese to
>> english, but there are still "chinglish"  ^ ^. Any way i wish new 
>> version will be helpful to our communication.
>>  Any suggestion or question are welcome, either to the subject 
>> or grammar.
>> 
>> Springring.Xu
>> 
>> Download from:   mail.google.com
>> 
>> login ID: hadoopcn
>> password:  mapreduce
>> 
>> 
>> 
>> > - Original Message - 
>> > From: "springring" 
>> > To: 
>> > Sent: Sunday, February 21, 2010 6:37 PM
>> > Subject: Re: [help!] [paper]deconstruct Hadoop Distributed File System
>> > 
>> > 
>> >>I have upload the paper to gmail  
>> >> 
>> >> mail.google.com
>> >> 
>> >> login ID: hadoopcn
>> >> password:  mapreduce
>> >> 
>> >> 
>> >> - Original Message - 
>> >> From: "springring" 
>> >> To: 
>> >> Sent: Sunday, February 21, 2010 5:31 PM
>> >> Subject: [help!] [paper]deconstruct Hadoop Distributed File System
>> >> 
>> >> 
>> >>>??? ???||
>> >>> 
>> >>> o~~  fail again 
>> >>> 
>> >>> is there limit of size or mail address?  
>> >>> my mail is springr...@126.com
>> >>> 
>> >>> Anyone take an interest in the subject,mail to me as above address 
>> >>> please.
>> >>> you are welcome~
>> >>> 
>> >>> in addition, can administrator help me to send the attached file. thks.
>> >>> 
>> >>> 
>> >>> - Original Message - 
>> >>> From: "springring" 
>> >>> To: 
>> >>> Sent: Sunday, February 21, 2010 4:50 PM
>> >>> Subject: Re: [paper]deconstruct Hadoop Distributed File System
>> >>> 
>> >>> 
>> >>>> sorry~ 
>> >>>> 
>> >>>> perhaps the *.pdf file processed as spam?  so, try again as attached 
>> >>>> .rar file.
>> >>>> 
>> >>>> 
>> >>>> - Original Message - 
>> >>>> From: "springring" 
>> >>>> To: 
>> >>>> Sent: Sunday, February 21, 2010 12:52 PM
>> >>>> Subject: [paper]deconstruct Hadoop Distributed File System
>> >>>> 
>> >>>> 
>> >>>>> Hi All,
>> >>>>>The attached file is my paper about deconstruct Hadoop Distributed 
>> >>>>> File System and Cluster triplet-Space Model.
>> >>>>> Look forward your comments or suggestions. 
>> >>>>> 
>> >>>>> 
>> >>>>> 
>> >>>>> Springring.Xu
>> >>>>>

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-02 Thread springring
I have a question.

Now that hadoop have "maper" and "reducer" , how about the solution like 
Map and Reduce
or that directly Reduce here nodepair can be look as a 
branch...


- Original Message - 
From: "Konstantin Shvachko" 
To: 
Sent: Tuesday, March 02, 2010 10:21 AM
Subject: Re: Namespace partitioning using Locality Sensitive Hashing


> Symlinks is a brand new feature in HDFS.
> You can read about it in
> https://issues.apache.org/jira/browse/HDFS-245
> Documentation is here:
> https://issues.apache.org/jira/secure/attachment/12434745/design-doc-v4.txt
> 
> Symbolic links in HDFS can point to a directory in a different file system,
> particularly on a different HDSF cluster. They are like mount points in this 
> case.
> So you can create a symlink on cluster C1 pointing to the root of cluster C2.
> This makes C2 a sub-namespace of C1.
> 
> --Konstantin
> 
> On 3/1/2010 5:42 PM, Ketan Dixit wrote:
>> Hello,
>> Thank you Konstantin and  Allen for your reply. The information
>> provided really helped to improve my understanding.
>> However I still have few questions.
>> How Symlinks/ soft links are used to solve the probem of partitioning.
>> (Where do the symlinks point to? All the mapping is
>> stored in memory but symlinks point to file objects? This is little
>> confusing to me)
>> Can you please provide insight into this?
>>
>> Thanks,
>> Ketan
>>
>> On Mon, Mar 1, 2010 at 3:26 PM, Konstantin Shvachko  
>> wrote:
>>>
>>> Hi Ketan,
>>>
>>> AFAIU, hashing is used to map files and directories into different 
>>> name-nodes.
>>> Suppose you use a simple hash function on a file path h(path), and that 
>>> files
>>> with the same hash value (or within a hash range) are mapped to the same 
>>> name-node.
>>> Then files with the same parent will be randomly mapped into different
>>> name-nodes: Pr(h(/dir/file1) = h(/dir/file2)) - is small.
>>>
>>> The ides with LSH is to add some locality factor in the hash function in 
>>> order
>>> to increase probability of placing files from the same directory (or a 
>>> subtree)
>>> into the same name-node.
>>>
>>> Example 1.
>>> Suppose that you apply MD5 only to the path to the parent rather
>>> then to the entire file path: h(/root/dir/file) = MD5(/root/dir)
>>> Then all files of the same directory will have the same hash value and 
>>> therefore
>>> will be mapped into the same name-node.
>>>
>>> Example 2.
>>> If a path consists of path components pi, where i = 1,..,n
>>> Lets define the following hash function:
>>> h(/p1/.../pn) = 0, if n<  7
>>> h(/p1/.../pn) = MD5(/p1/.../p7), if n>= 7
>>> With this hash function each subtree rooted at level 7 of the namepsace 
>>> hierarchy
>>> will be entirely mapped to the same name-node.
>>>
>>> There could be more elaborate examples.
>>>
>>> Symlinks do provide a way to partition the namespace, as Allen points out,
>>> although this is a static partitioning. Static partitions as opposed to 
>>> dynamic
>>> ones do not guarantee that the partitions will be "equal" in size, where 
>>> "size"
>>> may have different meanings (like number of files, or space occupied by the
>>> files, or number of blocks).
>>> A good hash function need to conform to some equal partitioning requirement.
>>> Function from Example 2 would be considered bad in this sense, while 
>>> Example 1
>>> defines a good function.
>>>
>>> This is my take on the problem. Hope it makes sense,
>>> --Konstantin
>>>
>>>
>>> On 3/1/2010 8:48 AM, Ketan Dixit wrote:

 Hi,
 I am a graduate student in Computer Science department at SUNY Stony Brook.
   I am thinking of doing a project on Hadoop for my course "Cloud 
 Computing"
 conducted by Prof. Radu Sion.
 While going through the links of the "Yahoo open source projects for
 students"  page I found the idea
 "Research on new hashing schemes for filesystem namespace partitioning"
 interesting. It looks to me the idea is
 to assign subtree of the whole namespace to one namenode and another 
 subtree
 to another namenode.
 How  LSH is better than normal hashing?  Because still, a client or a fixed
 namenode has to take decision of which namenode to contact in whatever
 hashing ? It looks to me that requests to files under same subtree are
 directed to the same namenode then the performance will be faster as the
 requests to the same namenode are clustered around the a part of namespace
 subtree
 (For example a part of on which client is doing some operation.) Is this
 assumption correct? Can I have more insight in this regard.



 Thanks,
 Ketan

>>>
>>
>

web access interface for HDFS

2010-12-10 Thread Springring
hi all,
   I want to making sure one thing --if there are web page in HDFS to access 
files?
I know that there are command like "fs -put" and "fs -get",even more we can 
download 
file from web like "slave:50075".But is there a way to put file in HDFS through 
web?
Additional , is there function to authentication web user and limmit the 
space size 
they can use? Thks.


Springring.Xu

question about CDH3

2011-02-15 Thread springring
Hi,
I install CDH3 follow the mannul as attached file,
but when I run the command 
"su -s /bin/bash -hdfs -c 'hadoop namenode -format'" 
on page 25, it show that "su: invalid option --h", 
so I change the comand to 
"su -s /bin/bash -hdfs -c'hadoop namenode -format'"
the message is that 
"May not run daemons as root.Please specify HADOOP_NAMENODE_USER"
So, is there any wrong in my operation?
Thanks.

Springring.Xu

how to create a group in hdfs

2011-03-22 Thread springring
Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring

Is there "useradd" in Hadoop

2011-03-22 Thread springring
Hi,

There are "chmod"、"chown"、"chgrp" in HDFS,
is there some command like "useradd -g" to add a 
user in a group,? Even more, is there  "hadoop's
group", not "linux's group"?


Ring

Re: how to create a group in hdfs

2011-03-23 Thread springring
Segel,

I got it, and sorry I just send this mail by "answer all" another mail, 
and forget that include hbase.
Thanks.

Ring 


- Original Message - 
From: "Segel, Mike" 
To: ; "Ryan Rawson" 
Cc: ; 
Sent: Wednesday, March 23, 2011 10:32 PM
Subject: RE: how to create a group in hdfs


Not sure why this has anything to do with hbase...

The short answer... 
Outside of the supergroup which is controlled by dfs.permissions.supergroup, 
Hadoop apparently checks to see if the owner is a member of the group you want 
to use.
This could be controlled by the local machine's /etc/group file, or if you're 
using NIS or LDAP, its controlled there.

So you can run the unix shell command groups to find out which group(s) you 
belong to, and then switch to one of those.

HTH

-Mike


-----Original Message-
From: springring [mailto:springr...@126.com] 
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: u...@hbase.apache.org; common-u...@hadoop.apache.org; 
common-dev@hadoop.apache.org
Subject: how to create a group in hdfs

Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring


The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.