Re: Using HDFS as a secondary FS

Ivan V. Tue, 15 Dec 2015 05:33:07 -0800

Hi, Denis,
1) my opinion is that we'd better not mention 'setup-hadoop' script at all
(for the reasons mentioned above) and delete it in the nearest release.
2) Now Ignite is a part of BigTop distribution (see
https://issues.apache.org/jira/browse/IGNITE-665), so the old BigTop
instruction is not relevant any more. I guess, this is the reason.



On Tue, Dec 15, 2015 at 12:35 PM, Denis Magda <[email protected]> wrote:

> Hi Ivan,
>
> Thanks for clarification.
>
> Actually I’ve modified the content of the following pages:
>
> - Added “Atomatic Hadoop Configuration” section that describes the usage
> of setup-hadoop with all its pros and cons for Apache Hadoop and CDH
>
> http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration
> http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh
>
> - Provided more info on how to use ‘HDFS’ as a secondary file system for
> ‘IGFS’ using your yesterday answer and referring to the updated
> configuration guides
> http://apacheignite.gridgain.org/docs/secondary-file-system
>
> Please as an IGFS & Hadoop expert review my changes and edit them whenever
> required.
>
> In addition I noted that we have a disabled and empty article for BigTop
> distribution. Is this OK?
>
> —
> Denis
>
> > On 15 дек. 2015 г., at 12:10, Ivan V. <[email protected]> wrote:
> >
> > Denis, good question.
> > Yes, there are several reasons.
> > 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for
> all
> > others (e.g. BigTop)
> > 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml),
> > what prevents further cluster usage without Ignite.
> > 3) setup-hadoop needs write permission to all the folders it writes files
> > to.
> > 4) It is possible to provide all the required functionality without any
> > file modifications in the existing Hadoop cluster at all, see
> > https://issues.apache.org/jira/browse/IGNITE-483.
> >
> > There were plans to remove "setup-hadoop", but that is not yet done.
> > In any way, I 100% agree that presence of several different versions of
> the
> > documentation is quite confusing and misleading.
> >
> >
> > On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <[email protected]>
> wrote:
> >
> >> Ivan,
> >>
> >> Is there any reason why we don’t recommend using
> >> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop
> >> Accelerator articles?
> >>
> >> With setup-hadoop.sh I was able to build a valid classpath, create
> >> symlinks to the accelerator's jars from hadoop’s libs folder
> automatically
> >> and started an Ignite node that uses HDFS as a secondary FS in less
> than 10
> >> minutes.
> >>
> >> I just followed the instructions from
> >> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the
> >> readme.io <http://readme.io/> look much more complex for me, they don’t
> >> mention setup-hadoop.sh/bat at all making the end user to perform a
> >> manual setup.
> >>
> >> —
> >> Denis
> >>
> >>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[email protected]
> >
> >> wrote:
> >>>
> >>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[email protected]>
> >> wrote:
> >>>
> >>>> Yes, this will be documented tomorrow. I want to go though all the
> steps
> >>>> by myself checking all other possible obstacles the user may face
> with.
> >>>>
> >>>
> >>> Thanks, Denis!
> >>>
> >>>
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <
> [email protected]
> >>>
> >>>> wrote:
> >>>>>
> >>>>> Ivan, I think this should be documented, no?
> >>>>>
> >>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> To enable just an IGFS persistence there is no need to use HDFS
> (this
> >>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> >>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120
> ,
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> >>>>>> persistence upon local file system, and we already close to  the
> >>>> solution.
> >>>>>>
> >>>>>> Regarding the secondary Fs doc page (
> >>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I
> would
> >>>>>> suggest to add the following text there:
> >>>>>> ------------------------
> >>>>>> If Ignite node with secondary file system configured on a machine
> with
> >>>>>> Hadoop distribution, make sure Ignite is able to find appropriate
> >> Hadoop
> >>>>>> libraries: set HADOOP_HOME environment variable for the Ignite
> process
> >>>> if
> >>>>>> you're using Apache Hadoop distribution, or, if you use another
> >>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure
> >> /etc/default/hadoop
> >>>>>> file exists and has appropriate contents.
> >>>>>>
> >>>>>> If Ignite node with secondary file system configured on a machine
> >>>> without
> >>>>>> Hadoop distribution, you can manually add necessary Hadoop
> >> dependencies
> >>>> to
> >>>>>> Ignite node classpath: these are dependencies of groupId
> >>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml .
> Currently
> >>>> they
> >>>>>> are:
> >>>>>>
> >>>>>> 1. hadoop-annotations
> >>>>>> 2. hadoop-auth
> >>>>>> 3. hadoop-common
> >>>>>> 4. hadoop-hdfs
> >>>>>> 5. hadoop-mapreduce-client-common
> >>>>>> 6. hadoop-mapreduce-client-core
> >>>>>>
> >>>>>> ------------------------
> >>>>>>
> >>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> Guys,
> >>>>>>>
> >>>>>>> Why don't we include ignite-hadoop module in Fabric? This user
> simply
> >>>>>> wants
> >>>>>>> to configure HDFS as a secondary file system to ensure persistence.
> >> Not
> >>>>>>> having the opportunity to do this in Fabric looks weird to me. And
> >>>>>> actually
> >>>>>>> I don't think this is a use case for Hadoop Accelerator.
> >>>>>>>
> >>>>>>> -Val
> >>>>>>>
> >>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[email protected]
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Ivan,
> >>>>>>>>
> >>>>>>>> 1) Yes, I think that it makes sense to have the old versions of
> the
> >>>>>> docs
> >>>>>>>> while an old version is still considered to be used by someone.
> >>>>>>>>
> >>>>>>>> 2) Absolutely, the time to add a corresponding article on the
> >>>>>> readme.io
> >>>>>>>> has come. It's not the first time I see the question related to
> HDFS
> >>>>>> as a
> >>>>>>>> secondary FS.
> >>>>>>>> Before and now it's not clear for me what exact steps I should
> >> follow
> >>>>>> to
> >>>>>>>> enable such a configuration. Our current suggestions look like a
> >>>>>> puzzle.
> >>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan
> if
> >>>>>> you
> >>>>>>>> don't mind I would reaching you out directly asking for any
> >> technical
> >>>>>>>> assistance if needed.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Denis
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
> >>>>>>>>
> >>>>>>>>> Hi, Valentin,
> >>>>>>>>>
> >>>>>>>>> 1) first of all note that the author of the question uses not the
> >>>>>> latest
> >>>>>>>>> doc page, namely
> >>>>>>>>>
> >>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> >>>>>> .
> >>>>>>>>> This is version 1.0, while the latest is 1.5:
> >>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides,
> >> it
> >>>>>>>>> appeared that some links from the latest doc version point to 1.0
> >> doc
> >>>>>>>>> version. I fixed that in several places where I found that. Do we
> >>>>>> really
> >>>>>>>>> need old doc versions (1.0 -1.4)?
> >>>>>>>>>
> >>>>>>>>> 2) our documentation (
> >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system)
> does
> >>>> not
> >>>>>>>>> provide any special setup instructions to configure HDFS as
> >> secondary
> >>>>>>> file
> >>>>>>>>> system in Ignite. Our docs assume that if a user wants to
> integrate
> >>>>>> with
> >>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction
> (e.g.
> >>>>>>>>>
> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop
> >> ).
> >>>>>> It
> >>>>>>>>> looks like the page
> >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system
> should
> >>>> be
> >>>>>>>>> more
> >>>>>>>>> clear regarding the required configuration steps (in fact,
> setting
> >> up
> >>>>>>>>> HADOOP_HOME variable for Ignite node process).
> >>>>>>>>>
> >>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
> >>>>>> conditions
> >>>>>>>>> are met:
> >>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> >>>>>> edition).
> >>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
> >>>>>> Hadoop
> >>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches
> the
> >>>>>>> Hadoop
> >>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
> >>>>>>>>>
> >>>>>>>>> The exact mechanism of the Hadoop classpath composition can be
> >> found
> >>>>>> in
> >>>>>>>>> files
> >>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >>>>>>>>> IGNITE_HOME/bin/include/setenv.sh .
> >>>>>>>>>
> >>>>>>>>> The issue is discussed in
> >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
> >>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>>>>>>>>
> >>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >>>>>>>>> [email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> Igniters,
> >>>>>>>>>>
> >>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>>>>>>>>
> >>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and
> >> without
> >>>>>>>>>> Hadoop
> >>>>>>>>>> JARs, assuming that user will include them from the Hadoop
> >>>>>> distribution
> >>>>>>>>>> he
> >>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to
> Hadoop
> >> to
> >>>>>>> run
> >>>>>>>>>> mapreduce jobs, but I can't figure out steps required to
> configure
> >>>>>> HDFS
> >>>>>>>>>> as
> >>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on
> classpath?
> >>>> Is
> >>>>>>>>>> user
> >>>>>>>>>> supposed to add them manually?
> >>>>>>>>>>
> >>>>>>>>>> Can someone with more expertise in our Hadoop integration
> clarify
> >>>>>>> this? I
> >>>>>>>>>> believe there is not enough documentation on this topic.
> >>>>>>>>>>
> >>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which
> is
> >> in
> >>>>>>>>>> 'mapred' package? Why map-reduce class is being used?
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>>>>>>>>
> >>>>>>>>>> -Val
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Using HDFS as a secondary FS

Reply via email to