Re: directory usage question

Brian Jeltema Mon, 08 Sep 2014 04:17:13 -0700

> Your cluster is an insecure HBase deployment, right ?

Yes


> 
> Are all files under /apps/hbase/data/archive/data/default owned by user
> 'hdfs’ ?

No. However the ownership failure isn’t what I’m concerned about; I understand 
what caused that.
But the stack trace illustrated behavior of initTableSnapshotMapperJob that I 
didn’t expect, and
I’m just trying to understand what it’s doing.

> 
> BTW in tip of 0.98, with HBASE-11742, related code looks a bit different.
> 
> Cheers
> 
> 
> On Sun, Sep 7, 2014 at 8:27 AM, Brian Jeltema <
> [email protected]> wrote:
> 
>> 
>>> Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
>>> called by initTableSnapshotMapperJob (in master branch)
>>> 
>>> Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
>>> between the two.
>>> 
>>> Do you have stack trace or something else showing the relationship ?
>> 
>> Right. That’s what I meant by ‘indirectly’. This is a stack trace that was
>> caused by an ownership conflict:
>> 
>> java.io.IOException: java.util.concurrent.ExecutionException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=hbase, access=WRITE,
>> inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> at
>> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
>> at
>> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
>> at
>> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
>> at
>> net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
>> at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>> 
>>> 
>>> Cheers
>>> 
>>> 
>>> On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
>>> [email protected]> wrote:
>>> 
>>>> initTableSnapshotMapperJob writes into this directory (indirectly) via
>>>> RestoreSnapshotHelper.restoreHdfsRegions
>>>> 
>>>> Is this expected? I would have expected writes to be limited to the temp
>>>> directory passed in the init call
>>>> 
>>>> Brian
>>>> 
>>>> On Sep 7, 2014, at 8:17 AM, Ted Yu <[email protected]> wrote:
>>>> 
>>>>> The files under archive directory are referenced by snapshots.
>>>>> Please don't delete them manually.
>>>>> 
>>>>> You can delete unused snapshots.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
>>>> [email protected]> wrote:
>>>>> 
>>>>>> 
>>>>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <[email protected]> wrote:
>>>>>> 
>>>>>>> Can you post your hbase-site.xml ?
>>>>>>> 
>>>>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
>>>> (e.g.
>>>>>>> when a column family is deleted, HFiles for this column family are
>>>> stored
>>>>>>> here).
>>>>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>>>>>> 
>>>>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I
>> think
>>>> that's the default that Ambari creates.
>>>>>> 
>>>>>> So the HFiles in the archive subdirectory have been discarded and can
>>>> be deleted safely?
>>>>>> 
>>>>>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>>>>>> 
>>>>>>> Can you describe the problem in a bit more detail ?
>>>>>> 
>>>>>> I don't understand what I'm seeing well enough to ask an intelligent
>>>> question yet.
>>>>>> I appear to be scanning duplicate rows when using
>>>> initTableSnapshotMapperJob,
>>>>>> but I'm trying to get a better understanding of how this works, since
>>>> It's probably just
>>>>>> something I'm doing wrong.
>>>>>> 
>>>>>> Brian
>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> I'm trying to track down a problem I'm having running map/reduce
>> jobs
>>>>>>>> against snapshots.
>>>>>>>> Can someone explain the difference between files stored in:
>>>>>>>> 
>>>>>>>> /apps/hbase/data/archive/data/default
>>>>>>>> 
>>>>>>>> and files stored in
>>>>>>>> 
>>>>>>>> /apps/hbase/data/data/default
>>>>>>>> 
>>>>>>>> (Hadoop 2.4, HBase 0.98)
>>>>>>>> 
>>>>>>>> Thanks
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: directory usage question

Reply via email to