RE: kfs and hdfs

2011-10-10 Thread Segel, Mike
nnels. These are dev. > mailing lists. > > Cos > > On Mon, Oct 10, 2011 at 01:59PM, Segel, Mike wrote: >> Owen, >> Are you still a bit touchy over Mike Olson's rebuttal to your blog? >> :-P (I kee-id, I kee-id) >> >> >> -Original Mess

RE: kfs and hdfs

2011-10-10 Thread Segel, Mike
Owen, Are you still a bit touchy over Mike Olson's rebuttal to your blog? :-P (I kee-id, I kee-id) -Original Message- From: Owen O'Malley [mailto:o...@hortonworks.com] Sent: Monday, October 10, 2011 1:14 PM To: common-dev@hadoop.apache.org Subject: Re: kfs and hdfs Ted, Please keep th

RE: Research projects for hadoop

2011-09-09 Thread Segel, Mike
Why would you want to take a perfectly good machine and then try to virtualize it? I mean if I have 4 quad core cpus, I can run a lot of simultaneous map tasks. However if I virtualize the box, I lose at least 1 core per VM so I end up with 4 nodes that have less capabilities and performance tha

RE: Architectural Questions

2011-08-09 Thread Segel, Mike
Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, August 09, 2011 3:43 PM To: common-dev@hadoop.apache.org Subject: Re: Architectural Questions Hey Mike, On Wed, Aug 10, 2011 at 2:08 AM, Segel, Mike wrote: > Uhm... > Ok... I just did a quick Google search on Hadoop and

RE: Architectural Questions

2011-08-09 Thread Segel, Mike
Uhm... Ok... I just did a quick Google search on Hadoop and Derby... https://issues.apache.org/jira/browse/HADOOP-4133 Then there's this subproject called Hive... :-) Oh and then there's this other subproject called Oozie... :-) Now Cloudera and other support MySQL in the role of Derby aka Clouds

Re: google snappy

2011-03-23 Thread Segel, Mike
Lol... It's already in the works... Sent from my Palm Pre , Please excuse any spelling errors. On Mar 23, 2011 11:53 AM, Weishung Chung wrote: Hey my fellow hadoop/hbase developers, I just came across this google compression/decompression package yesterday, c

RE: how to create a group in hdfs

2011-03-23 Thread Segel, Mike
Not sure why this has anything to do with hbase... The short answer... Outside of the supergroup which is controlled by dfs.permissions.supergroup, Hadoop apparently checks to see if the owner is a member of the group you want to use. This could be controlled by the local machine's /etc/group f

RE: Stopping datanodes dynamically.

2011-01-31 Thread Segel, Mike
nodes dynamically. If you want to decommission a datanode, http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#DFSAdmin+Command -refreshNodes briefly explains how it works. Koji On 1/31/11 4:35 AM, "Segel, Mike" wrote: James, Remove the node without stopping what? If you m

RE: Stopping datanodes dynamically.

2011-01-31 Thread Segel, Mike
James, Remove the node without stopping what? If you mean you want to remove the data node without stopping the master, you have a couple of ways... First, if you're running Cloudera's CDH3b3 release you have /etc/init.d scripts where you can issue a stop command. (Stopping the datanode and th

RE: Hadoop - Eclipse Plugin Error

2011-01-18 Thread Segel, Mike
Raghu, Are you running your cluster on localhost? (Meaning are you running a pseudo cluster on the same machine as your eclipse session? ) HTH -Mike -Original Message- From: Raghu R [mailto:raghu@gmail.com] Sent: Tuesday, January 18, 2011 11:56 AM To: common-u...@hadoop.apache.or

RE: Hadoop use direct I/O in Linux?

2011-01-05 Thread Segel, Mike
You are mixing a few things up. You're testing your I/O using C. What do you see if you try testing your direct I/O from Java? I'm guessing that you'll keep your i/o piece in place and wrap it within some JNI code and then re-write the test in Java? Also are you testing large streams or random

RE: Hadoop use direct I/O in Linux?

2011-01-04 Thread Segel, Mike
All, While this is an interesting topic for debate, I think it's a moot point. A lot of DBAs (Especially Informix DBAs) don't agree with Linus. (I'm referring to an earlier post in this thread that referenced a quote from Linus T.) Direct I/O is a good thing. But if Linus is removing it from Li

RE: IOException: Owner 'mapred' for path XY not match expected owner 'AB'

2010-10-26 Thread Segel, Mike
Yeah... You need to go through each node and check to make sure all of your ownerships and permission levels are set correctly. It's a pain in the ass, but look on the bright side. You only have to do it once. :-) -Mike -Original Message- From: patrickange...@gmail.com [mailto:patric

RE: [jira] Created: (HADOOP-6923) Native Libraries do not load if a different platform signature is returned from org.apache.hadoop.util.PlatformName

2010-08-25 Thread Segel, Mike
This may seem like a silly question... Is there anyone using a 32bit JRE environment these days? Also I notice this is for 0.20.3 is this still an issue in 0.89 or 0.20.6? I really am curious if anyone outside of IBM is using 32bit Java in any capacity w hadoop? Thx -Mike -Original Me

RE: Starting a job on a hadoop cluster remotly

2010-07-28 Thread Segel, Mike
Hi, Since you didn't get an answer... yes you can. I'm working from memory so I may be a bit fuzzy on the details... Your external app has to be 'cloud aware'. Essentially create a config file for your application that you can read in which lets your app know where the JT and NN are. Then you

RE: Hadoop Compression - Current Status

2010-07-14 Thread Segel, Mike
compress the sequencefile with a codec, be it gzip, bz2 or lzo. SequenceFiles do get you splittability which you won't get with just Gzip (until we get MAPREDUCE-491) or the hadoop-lzo InputFormats. cheers, - Patrick On Mon, Jul 12, 2010 at 2:42 PM, Segel, Mike wrote: > How can you say z

RE: Hadoop Compression - Current Status

2010-07-12 Thread Segel, Mike
How can you say zip files are 'best codecs' to use? Call me silly but I seem to recall that if you're using a zip'd file for input you can't really use a file splitter? (Going from memory, which isn't the best thing to do...) -Mike -Original Message- From: Stephen Watt [mailto:sw...@us

RE: Task scheduler

2010-05-17 Thread Segel, Mike
+1 I agree with Steve that sometimes you need to redirect where you want the work to occur. Over time, your cloud will not have homogenous data nodes. You may end up with a cluster of nodes that have a Fermi card (NVIDA CUDA enabled cards) where you want to do some serious number crunching. [ I

RE: [jira] Created: (HADOOP-6720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JT UI

2010-04-22 Thread Segel, Mike
I would disagree. Its important to avoid 'information overload'. Since you can drill down by following the link, you can see the task's state. Whether it was killed or it failed. So it is currently possible to see this in the JT UI. -Mike -Original Message- From: Subramaniam Krishnan

RE: Map Reduce in heterogeneous environ..

2010-03-11 Thread Segel, Mike
Steve, I agree that this may not be a problem with Hadoop, but more of an issue of how to manage Hadoop. So what are you suggesting? If I understand your comments, would the following be a good idea? In a common directory, we have a hadoop.conf directory which contain all of the configuration

RE: Jobtrackers scalability

2010-03-03 Thread Segel, Mike
Hi, What exactly do you mean by 'commodity' hardware? I mean if you have the budget to build out a 10,000 node cloud, don't you think you'd have the money to beef up your job tracker? With respect to the job tracker, its more memory intensive than disk, right? So how many nodes can you run in a

RE: Map-Reduce for Security

2010-03-01 Thread Segel, Mike
Suji, I'm only a couple of months in to using Map/Reduce, but I think you have a couple of issues. What do you mean by 'security'? ('Security' can mean different things to different people.) Map/Reduce works at the 'row' level. So if you were to encrypt the data, you'd have to encrypt it on a r

RE: Hadoop Security

2010-02-22 Thread Segel, Mike
Hi, Sorry for jumping in to this late, but has anyone thought about how this could be extended in to HBase? I realize this is Hadoop security, but eventually HBase and other apps that sit on top of hadoop will have to deal with security issues too. I'm not suggesting that a solution be worked