Re: Using HDFS as a secondary FS

Denis Magda Tue, 15 Dec 2015 01:36:12 -0800

Hi Ivan,

Thanks for clarification.


Actually I’ve modified the content of the following pages:

- Added “Atomatic Hadoop Configuration” section that describes the usage of 
setup-hadoop with all its pros and cons for Apache Hadoop and CDH
http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration
http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh

- Provided more info on how to use ‘HDFS’ as a secondary file system for ‘IGFS’ 
using your yesterday answer and referring to the updated configuration guides
http://apacheignite.gridgain.org/docs/secondary-file-system

Please as an IGFS & Hadoop expert review my changes and edit them whenever 
required.

In addition I noted that we have a disabled and empty article for BigTop 
distribution. Is this OK?

—
Denis

> On 15 дек. 2015 г., at 12:10, Ivan V. <[email protected]> wrote:
> 
> Denis, good question.
> Yes, there are several reasons.
> 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all
> others (e.g. BigTop)
> 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml),
> what prevents further cluster usage without Ignite.
> 3) setup-hadoop needs write permission to all the folders it writes files
> to.
> 4) It is possible to provide all the required functionality without any
> file modifications in the existing Hadoop cluster at all, see
> https://issues.apache.org/jira/browse/IGNITE-483.
> 
> There were plans to remove "setup-hadoop", but that is not yet done.
> In any way, I 100% agree that presence of several different versions of the
> documentation is quite confusing and misleading.
> 
> 
> On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[email protected]> wrote:
> 
>> Ivan,
>> 
>> Is there any reason why we don’t recommend using
>> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop
>> Accelerator articles?
>> 
>> With setup-hadoop.sh I was able to build a valid classpath, create
>> symlinks to the accelerator's jars from hadoop’s libs folder automatically
>> and started an Ignite node that uses HDFS as a secondary FS in less than 10
>> minutes.
>> 
>> I just followed the instructions from
>> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the
>> readme.io <http://readme.io/> look much more complex for me, they don’t
>> mention setup-hadoop.sh/bat at all making the end user to perform a
>> manual setup.
>> 
>> —
>> Denis
>> 
>>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[email protected]>
>> wrote:
>>> 
>>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[email protected]>
>> wrote:
>>> 
>>>> Yes, this will be documented tomorrow. I want to go though all the steps
>>>> by myself checking all other possible obstacles the user may face with.
>>>> 
>>> 
>>> Thanks, Denis!
>>> 
>>> 
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[email protected]
>>> 
>>>> wrote:
>>>>> 
>>>>> Ivan, I think this should be documented, no?
>>>>> 
>>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> To enable just an IGFS persistence there is no need to use HDFS (this
>>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
>>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
>>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
>>>>>> persistence upon local file system, and we already close to  the
>>>> solution.
>>>>>> 
>>>>>> Regarding the secondary Fs doc page (
>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
>>>>>> suggest to add the following text there:
>>>>>> ------------------------
>>>>>> If Ignite node with secondary file system configured on a machine with
>>>>>> Hadoop distribution, make sure Ignite is able to find appropriate
>> Hadoop
>>>>>> libraries: set HADOOP_HOME environment variable for the Ignite process
>>>> if
>>>>>> you're using Apache Hadoop distribution, or, if you use another
>>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure
>> /etc/default/hadoop
>>>>>> file exists and has appropriate contents.
>>>>>> 
>>>>>> If Ignite node with secondary file system configured on a machine
>>>> without
>>>>>> Hadoop distribution, you can manually add necessary Hadoop
>> dependencies
>>>> to
>>>>>> Ignite node classpath: these are dependencies of groupId
>>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
>>>> they
>>>>>> are:
>>>>>> 
>>>>>> 1. hadoop-annotations
>>>>>> 2. hadoop-auth
>>>>>> 3. hadoop-common
>>>>>> 4. hadoop-hdfs
>>>>>> 5. hadoop-mapreduce-client-common
>>>>>> 6. hadoop-mapreduce-client-core
>>>>>> 
>>>>>> ------------------------
>>>>>> 
>>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> Guys,
>>>>>>> 
>>>>>>> Why don't we include ignite-hadoop module in Fabric? This user simply
>>>>>> wants
>>>>>>> to configure HDFS as a secondary file system to ensure persistence.
>> Not
>>>>>>> having the opportunity to do this in Fabric looks weird to me. And
>>>>>> actually
>>>>>>> I don't think this is a use case for Hadoop Accelerator.
>>>>>>> 
>>>>>>> -Val
>>>>>>> 
>>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Ivan,
>>>>>>>> 
>>>>>>>> 1) Yes, I think that it makes sense to have the old versions of the
>>>>>> docs
>>>>>>>> while an old version is still considered to be used by someone.
>>>>>>>> 
>>>>>>>> 2) Absolutely, the time to add a corresponding article on the
>>>>>> readme.io
>>>>>>>> has come. It's not the first time I see the question related to HDFS
>>>>>> as a
>>>>>>>> secondary FS.
>>>>>>>> Before and now it's not clear for me what exact steps I should
>> follow
>>>>>> to
>>>>>>>> enable such a configuration. Our current suggestions look like a
>>>>>> puzzle.
>>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
>>>>>> you
>>>>>>>> don't mind I would reaching you out directly asking for any
>> technical
>>>>>>>> assistance if needed.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Denis
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
>>>>>>>> 
>>>>>>>>> Hi, Valentin,
>>>>>>>>> 
>>>>>>>>> 1) first of all note that the author of the question uses not the
>>>>>> latest
>>>>>>>>> doc page, namely
>>>>>>>>> 
>>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
>>>>>> .
>>>>>>>>> This is version 1.0, while the latest is 1.5:
>>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides,
>> it
>>>>>>>>> appeared that some links from the latest doc version point to 1.0
>> doc
>>>>>>>>> version. I fixed that in several places where I found that. Do we
>>>>>> really
>>>>>>>>> need old doc versions (1.0 -1.4)?
>>>>>>>>> 
>>>>>>>>> 2) our documentation (
>>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
>>>> not
>>>>>>>>> provide any special setup instructions to configure HDFS as
>> secondary
>>>>>>> file
>>>>>>>>> system in Ignite. Our docs assume that if a user wants to integrate
>>>>>> with
>>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>>>>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop
>> ).
>>>>>> It
>>>>>>>>> looks like the page
>>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
>>>> be
>>>>>>>>> more
>>>>>>>>> clear regarding the required configuration steps (in fact, setting
>> up
>>>>>>>>> HADOOP_HOME variable for Ignite node process).
>>>>>>>>> 
>>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
>>>>>> conditions
>>>>>>>>> are met:
>>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
>>>>>> edition).
>>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
>>>>>> Hadoop
>>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
>>>>>>> Hadoop
>>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
>>>>>>>>> 
>>>>>>>>> The exact mechanism of the Hadoop classpath composition can be
>> found
>>>>>> in
>>>>>>>>> files
>>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>>>>>>>>> IGNITE_HOME/bin/include/setenv.sh .
>>>>>>>>> 
>>>>>>>>> The issue is discussed in
>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
>>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>>>>>>>> 
>>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Igniters,
>>>>>>>>>> 
>>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>>>>>>>> 
>>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and
>> without
>>>>>>>>>> Hadoop
>>>>>>>>>> JARs, assuming that user will include them from the Hadoop
>>>>>> distribution
>>>>>>>>>> he
>>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop
>> to
>>>>>>> run
>>>>>>>>>> mapreduce jobs, but I can't figure out steps required to configure
>>>>>> HDFS
>>>>>>>>>> as
>>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
>>>> Is
>>>>>>>>>> user
>>>>>>>>>> supposed to add them manually?
>>>>>>>>>> 
>>>>>>>>>> Can someone with more expertise in our Hadoop integration clarify
>>>>>>> this? I
>>>>>>>>>> believe there is not enough documentation on this topic.
>>>>>>>>>> 
>>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which is
>> in
>>>>>>>>>> 'mapred' package? Why map-reduce class is being used?
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>>>>>>>> 
>>>>>>>>>> -Val
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Using HDFS as a secondary FS

Reply via email to