Re: Using HDFS as a secondary FS

Dmitriy Setrakyan Mon, 14 Dec 2015 09:26:45 -0800

On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <dma...@gridgain.com> wrote:


> Yes, this will be documented tomorrow. I want to go though all the steps
> by myself checking all other possible obstacles the user may face with.
>

Thanks, Denis!


>
> —
> Denis
>
> > On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
> >
> > Ivan, I think this should be documented, no?
> >
> > On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovs...@gridgain.com>
> wrote:
> >
> >> To enable just an IGFS persistence there is no need to use HDFS (this
> >> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> >> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
> >> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> >> persistence upon local file system, and we already close to  the
> solution.
> >>
> >> Regarding the secondary Fs doc page (
> >> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
> >> suggest to add the following text there:
> >> ------------------------
> >> If Ignite node with secondary file system configured on a machine with
> >> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
> >> libraries: set HADOOP_HOME environment variable for the Ignite process
> if
> >> you're using Apache Hadoop distribution, or, if you use another
> >> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
> >> file exists and has appropriate contents.
> >>
> >> If Ignite node with secondary file system configured on a machine
> without
> >> Hadoop distribution, you can manually add necessary Hadoop dependencies
> to
> >> Ignite node classpath: these are dependencies of groupId
> >> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
> they
> >> are:
> >>
> >>   1. hadoop-annotations
> >>   2. hadoop-auth
> >>   3. hadoop-common
> >>   4. hadoop-hdfs
> >>   5. hadoop-mapreduce-client-common
> >>   6. hadoop-mapreduce-client-core
> >>
> >> ------------------------
> >>
> >> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> >> valentin.kuliche...@gmail.com> wrote:
> >>
> >>> Guys,
> >>>
> >>> Why don't we include ignite-hadoop module in Fabric? This user simply
> >> wants
> >>> to configure HDFS as a secondary file system to ensure persistence. Not
> >>> having the opportunity to do this in Fabric looks weird to me. And
> >> actually
> >>> I don't think this is a use case for Hadoop Accelerator.
> >>>
> >>> -Val
> >>>
> >>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dma...@gridgain.com>
> >> wrote:
> >>>
> >>>> Hi Ivan,
> >>>>
> >>>> 1) Yes, I think that it makes sense to have the old versions of the
> >> docs
> >>>> while an old version is still considered to be used by someone.
> >>>>
> >>>> 2) Absolutely, the time to add a corresponding article on the
> >> readme.io
> >>>> has come. It's not the first time I see the question related to HDFS
> >> as a
> >>>> secondary FS.
> >>>> Before and now it's not clear for me what exact steps I should follow
> >> to
> >>>> enable such a configuration. Our current suggestions look like a
> >> puzzle.
> >>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
> >> you
> >>>> don't mind I would reaching you out directly asking for any technical
> >>>> assistance if needed.
> >>>>
> >>>> Regards,
> >>>> Denis
> >>>>
> >>>>
> >>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
> >>>>
> >>>>> Hi, Valentin,
> >>>>>
> >>>>> 1) first of all note that the author of the question uses not the
> >> latest
> >>>>> doc page, namely
> >>>>>
> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> >> .
> >>>>> This is version 1.0, while the latest is 1.5:
> >>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> >>>>> appeared that some links from the latest doc version point to 1.0 doc
> >>>>> version. I fixed that in several places where I found that. Do we
> >> really
> >>>>> need old doc versions (1.0 -1.4)?
> >>>>>
> >>>>> 2) our documentation (
> >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
> not
> >>>>> provide any special setup instructions to configure HDFS as secondary
> >>> file
> >>>>> system in Ignite. Our docs assume that if a user wants to integrate
> >> with
> >>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> >>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
> >> It
> >>>>> looks like the page
> >>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
> be
> >>>>> more
> >>>>> clear regarding the required configuration steps (in fact, setting up
> >>>>> HADOOP_HOME variable for Ignite node process).
> >>>>>
> >>>>> 3) Hadoop jars are correctly found by Ignite if the following
> >> conditions
> >>>>> are met:
> >>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> >> edition).
> >>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
> >> Hadoop
> >>>>> distribution), or file "/etc/default/hadoop" exists and matches the
> >>> Hadoop
> >>>>> distribution used (BigTop, Cloudera, HDP, etc.)
> >>>>>
> >>>>> The exact mechanism of the Hadoop classpath composition can be found
> >> in
> >>>>> files
> >>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
> >>>>> IGNITE_HOME/bin/include/setenv.sh .
> >>>>>
> >>>>> The issue is discussed in
> >>>>> https://issues.apache.org/jira/browse/IGNITE-372
> >>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
> >>>>>
> >>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> >>>>> valentin.kuliche...@gmail.com> wrote:
> >>>>>
> >>>>> Igniters,
> >>>>>>
> >>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
> >>>>>>
> >>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
> >>>>>> Hadoop
> >>>>>> JARs, assuming that user will include them from the Hadoop
> >> distribution
> >>>>>> he
> >>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
> >>> run
> >>>>>> mapreduce jobs, but I can't figure out steps required to configure
> >> HDFS
> >>>>>> as
> >>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
> Is
> >>>>>> user
> >>>>>> supposed to add them manually?
> >>>>>>
> >>>>>> Can someone with more expertise in our Hadoop integration clarify
> >>> this? I
> >>>>>> believe there is not enough documentation on this topic.
> >>>>>>
> >>>>>> BTW, any ideas why user gets exception for JobConf class which is in
> >>>>>> 'mapred' package? Why map-reduce class is being used?
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>>
> >>>>>>
> >>>
> >>
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>
>
>

Re: Using HDFS as a secondary FS

Reply via email to