Hello!
New message, please read <http://sherlockcollection.com/advantage.php?bxodz>
Ted Dunning
tx
On Fri, Dec 30, 2011 at 10:19 AM, Michel Segel wrote:
> Hi,
>
> Just FYI... Boris Lublinsky released a new article that starts to talk
> about a PoC we did earlier in the year.
>
> Without spoiling the article, the reason I wanted to point this out is
> that there have been a couple of posts a
I think that the API docs actually say globStatus is ordered and leave the
ordering semantics for listStatus undefined.
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
http://hadoop.apache.org/common/docs/r0.20.2/api/org/
You would probably be happier using an industrial strength crawler.
Check out Bixo.
http://bixolabs.com/about/focused-crawler/
On Thu, Oct 13, 2011 at 5:13 PM, Aishwarya Venkataraman <
avenk...@cs.ucsd.edu> wrote:
> Hello,
>
> I trying to make my web crawling go faster with hadoop. My mapper
On Sun, Oct 9, 2011 at 12:33 AM, gschen wrote:
>
> what is the differences between hdfs and kfs(kosmos file system)?
>
>
The biggest difference is that kfs is not very active (but not quite dead!)
and hdfs has a pretty active development community.
If you are looking for a file system that has a
HDFS does not really meet your needs. I think that MapR's solution would.
I will contact off-line to give details.
On Thu, Oct 6, 2011 at 3:35 PM, Hemant kulkarni wrote:
> Hi all,
> We are a small software development firm working on data backup
> software. We have a backup product which copies
The MapR system allocates files with 8K blocks internally, so I doubt that
any improvement that you see with a larger block size on HDFS is going to
matter much and it could seriously confuse your underlying file system.
The performance advantage for MapR has more to do with a better file system
d
This makes a bit of sense, but you have to worry about the inertia of the
data. Adding compute resources is easy. Adding data resources, not so
much. And if the computation is not near the data, then it is likely to be
much less effective.
On Wed, Sep 14, 2011 at 4:27 PM, Bharath Ravi wrote:
>
See mapr.com
We have added many enterprise features onto Hadoop including snapshots,
mirroring, NFS access,
high availability and higher performance.
Since this mailing list is primarily for Apache Hadoop, you should contact
me off-line if you would like more information.
On Mon, Sep 12, 2011 at
Review board already works. Hbase uses it extensively.
On Fri, Sep 9, 2011 at 2:15 PM, Kirby Bohling wrote:
> On Fri, Sep 9, 2011 at 4:04 PM, Doug Cutting wrote:
> > On 09/09/2011 01:38 PM, Kirby Bohling wrote:
> >> Someday I wish Apache would find/adopt a distributed version control
> >> syste
If you post the same patch with the same name, JIRA helps you out by greying
all the earlier versions out.
On Fri, Sep 9, 2011 at 7:03 AM, John George wrote:
> +1. Changing default to 'sorted by date' helps.
>
> John Vijoe George Edackattukudy
>
> On Sep 9, 2011, at 9:01 AM, "Uma Maheswara Rao G
One reasonable suggestion that I have heard recently was to do like Google
does and put a DNS front end onto Zookeeper. Machines would need to have
DNS set up properly and a requests for a special ZK based domain would have
to be delegated to the fancy DNS setup, but this would allow all kinds of
The has been a problem with more than one build failing (Mahout is the one
that I saw first) due to a change in maven version which meant that the
clover license isn't being found properly. At least, that is the tale I
heard from infra.
On Mon, Jan 31, 2011 at 1:31 PM, Eli Collins wrote:
> Hey
Konstantin has good advice here, but the reader should note that "remove"
should be read as "remote".
Easy typo to make, but this one changes meaning.
On Wed, Jan 26, 2011 at 12:27 PM, Konstantin Boudnik wrote:
> Another way is to use Java remove debugging feature, which allows you
> to keep yo
e that above mentoned giants use Hadoop via Cloudera?
>
Yahoo sponsored most of the writing of Yahoo and does not use Cloudera's
distribution.
Facebook sponsored the writing of Hive and probably still runs their own
version of Hadoop.
Why do you care if they use Cloudera's distributi
> http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673359.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>
--
Ted Dunning, CTO
DeepDyve
October?
On Fri, Jul 24, 2009 at 5:11 PM, Eric Baldeschwieler
wrote:
> I'd suggest oct 31st.
--
Ted Dunning, CTO
DeepDyve
Thu, Jul 23, 2009 at 6:44 AM, Giovanni Tusa wrote:
> Could you also suggest me some other useful links, maybe with examples if
> any, on how to implement such a mechanism?
>
--
Ted Dunning, CTO
DeepDyve
I would consider this to be a very delicate optimization with little utility
in the real world. It is very, very rare to reliably know how many records
the reducer will see. Getting this wrong would be a disaster. Getting it
right would be very difficult in almost all cases.
Moreover, this assu
19 matches
Mail list logo