Re: What if an XML file cross boundary of HDFS chunks?

2009-10-31 Thread Jason Venner
I use the StreamXMLRecordReader out of the streaming contrib package, it works very well. Your key becomes the stanza you are looking for. On Sat, Oct 31, 2009 at 7:38 AM, Oliver B. Fischer wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hello Jeff, > > does it means, that there is

Re: how to set one map task for each input key-value pair

2009-11-26 Thread Jason Venner
The only thing that comes immediately to mind is to write your own custom input format that knows how to tell where the boundaries are in your data set, and uses those to specify the beginning and end of the input splits. You can also tell the framework not to split your individual input files by

Re: how to set one map task for each input key-value pair

2009-11-28 Thread Jason Venner
the records and > then set the setMaxInputSplitSize() and setMinInputSplitSize() of > FileInputFormat class to this value? The mapper will extract the input after > ignoring the dummy characters. Do you think this could work? Thanks. > > Regards, > Upendra > > > ----- Orig

Re: Hadoop on Sun Solaris

2009-11-30 Thread Jason Venner
I had to hard code the os name in the build.xml file to get the native compression codec shared libraries to build for hadoop 19 On Mon, Nov 30, 2009 at 7:09 AM, Daniel Templeton wrote: > I'm using it on Solaris without any problem. Of course, I'm just using the > provided JAR files. As long as

Re: Planet hadoop

2010-01-07 Thread Jason Venner
www.prohadoopbook aggregates some of them, if you give me lists that you find I will fold them in. On Thu, Jan 7, 2010 at 8:14 AM, Leen Toelen wrote: > Hi, > > is there a Planet hadoop somewhere (a blog aggregating all blogs from the > hadoop community)? Couldn't find it on Google yet. > > regar