Fw: read this

2015-09-28 Thread Ted Dunning
Hello! New message, please read <http://sherlockcollection.com/advantage.php?bxodz> Ted Dunning

Re: New InfoQ article on HBase and Lucene

2011-12-31 Thread Ted Dunning
tx On Fri, Dec 30, 2011 at 10:19 AM, Michel Segel wrote: > Hi, > > Just FYI... Boris Lublinsky released a new article that starts to talk > about a PoC we did earlier in the year. > > Without spoiling the article, the reason I wanted to point this out is > that there have been a couple of posts a

Re: FileSystem contract of listStatus

2011-11-02 Thread Ted Dunning
I think that the API docs actually say globStatus is ordered and leave the ordering semantics for listStatus undefined. http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path) http://hadoop.apache.org/common/docs/r0.20.2/api/org/

Re: Web Crawler in hadoop - Unresponsive after a while

2011-10-14 Thread Ted Dunning
You would probably be happier using an industrial strength crawler. Check out Bixo. http://bixolabs.com/about/focused-crawler/ On Thu, Oct 13, 2011 at 5:13 PM, Aishwarya Venkataraman < avenk...@cs.ucsd.edu> wrote: > Hello, > > I trying to make my web crawling go faster with hadoop. My mapper

Re: kfs and hdfs

2011-10-09 Thread Ted Dunning
On Sun, Oct 9, 2011 at 12:33 AM, gschen wrote: > > what is the differences between hdfs and kfs(kosmos file system)? > > The biggest difference is that kfs is not very active (but not quite dead!) and hdfs has a pretty active development community. If you are looking for a file system that has a

Re: Hadoop for unstructured data storage

2011-10-06 Thread Ted Dunning
HDFS does not really meet your needs. I think that MapR's solution would. I will contact off-line to give details. On Thu, Oct 6, 2011 at 3:35 PM, Hemant kulkarni wrote: > Hi all, > We are a small software development firm working on data backup > software. We have a backup product which copies

Re: making file system block size bigger to improve hdfs performance ?

2011-10-03 Thread Ted Dunning
The MapR system allocates files with 8K blocks internally, so I doubt that any improvement that you see with a larger block size on HDFS is going to matter much and it could seriously confuse your underlying file system. The performance advantage for MapR has more to do with a better file system d

Re: Adding Elasticity to Hadoop MapReduce

2011-09-14 Thread Ted Dunning
This makes a bit of sense, but you have to worry about the inertia of the data. Adding compute resources is easy. Adding data resources, not so much. And if the computation is not near the data, then it is likely to be much less effective. On Wed, Sep 14, 2011 at 4:27 PM, Bharath Ravi wrote: >

Re: Platform MapReduce - Enterprise Features

2011-09-12 Thread Ted Dunning
See mapr.com We have added many enterprise features onto Hadoop including snapshots, mirroring, NFS access, high availability and higher performance. Since this mailing list is primarily for Apache Hadoop, you should contact me off-line if you would like more information. On Mon, Sep 12, 2011 at

Re: JIRA attachments order

2011-09-09 Thread Ted Dunning
Review board already works. Hbase uses it extensively. On Fri, Sep 9, 2011 at 2:15 PM, Kirby Bohling wrote: > On Fri, Sep 9, 2011 at 4:04 PM, Doug Cutting wrote: > > On 09/09/2011 01:38 PM, Kirby Bohling wrote: > >> Someday I wish Apache would find/adopt a distributed version control > >> syste

Re: JIRA attachments order

2011-09-09 Thread Ted Dunning
If you post the same patch with the same name, JIRA helps you out by greying all the earlier versions out. On Fri, Sep 9, 2011 at 7:03 AM, John George wrote: > +1. Changing default to 'sorted by date' helps. > > John Vijoe George Edackattukudy > > On Sep 9, 2011, at 9:01 AM, "Uma Maheswara Rao G

Re: Hadoop Master and Slave Discovery

2011-07-04 Thread Ted Dunning
One reasonable suggestion that I have heard recently was to do like Google does and put a DNS front end onto Zookeeper. Machines would need to have DNS set up properly and a requests for a special ZK based domain would have to be delegated to the fancy DNS setup, but this would allow all kinds of

Re: Hadoop-common-trunk-Commit is failing since 01/19/2011

2011-01-31 Thread Ted Dunning
The has been a problem with more than one build failing (Mahout is the one that I saw first) due to a change in maven version which meant that the clover license isn't being found properly. At least, that is the tale I heard from infra. On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins wrote: > Hey

Re: debugging hadoop

2011-01-26 Thread Ted Dunning
Konstantin has good advice here, but the reader should note that "remove" should be read as "remote". Easy typo to make, but this one changes meaning. On Wed, Jan 26, 2011 at 12:27 PM, Konstantin Boudnik wrote: > Another way is to use Java remove debugging feature, which allows you > to keep yo

Re: Hey Cloudera can you help us In beating Google Yahoo Facebook?

2009-10-02 Thread Ted Dunning
e that above mentoned giants use Hadoop via Cloudera? > Yahoo sponsored most of the writing of Yahoo and does not use Cloudera's distribution. Facebook sponsored the writing of Hive and probably still runs their own version of Hadoop. Why do you care if they use Cloudera's distributi

Re: last map task taking too long

2009-09-29 Thread Ted Dunning
> http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673359.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > > -- Ted Dunning, CTO DeepDyve

Re: [VOTE] Push back code freeze for 0.21

2009-07-24 Thread Ted Dunning
October? On Fri, Jul 24, 2009 at 5:11 PM, Eric Baldeschwieler wrote: > I'd suggest oct 31st. -- Ted Dunning, CTO DeepDyve

Re: Current security implementation in Hadoop

2009-07-23 Thread Ted Dunning
Thu, Jul 23, 2009 at 6:44 AM, Giovanni Tusa wrote: > Could you also suggest me some other useful links, maybe with examples if > any, on how to implement such a mechanism? > -- Ted Dunning, CTO DeepDyve

Re: Need help understanding the source

2009-07-06 Thread Ted Dunning
I would consider this to be a very delicate optimization with little utility in the real world. It is very, very rare to reliably know how many records the reducer will see. Getting this wrong would be a disaster. Getting it right would be very difficult in almost all cases. Moreover, this assu