Hi Ivan, Thanks for clarification.
Actually I’ve modified the content of the following pages: - Added “Atomatic Hadoop Configuration” section that describes the usage of setup-hadoop with all its pros and cons for Apache Hadoop and CDH http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh - Provided more info on how to use ‘HDFS’ as a secondary file system for ‘IGFS’ using your yesterday answer and referring to the updated configuration guides http://apacheignite.gridgain.org/docs/secondary-file-system Please as an IGFS & Hadoop expert review my changes and edit them whenever required. In addition I noted that we have a disabled and empty article for BigTop distribution. Is this OK? — Denis > On 15 дек. 2015 г., at 12:10, Ivan V. <iveselovs...@gridgain.com> wrote: > > Denis, good question. > Yes, there are several reasons. > 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for all > others (e.g. BigTop) > 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml), > what prevents further cluster usage without Ignite. > 3) setup-hadoop needs write permission to all the folders it writes files > to. > 4) It is possible to provide all the required functionality without any > file modifications in the existing Hadoop cluster at all, see > https://issues.apache.org/jira/browse/IGNITE-483. > > There were plans to remove "setup-hadoop", but that is not yet done. > In any way, I 100% agree that presence of several different versions of the > documentation is quite confusing and misleading. > > > On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <dma...@gridgain.com> wrote: > >> Ivan, >> >> Is there any reason why we don’t recommend using >> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop >> Accelerator articles? >> >> With setup-hadoop.sh I was able to build a valid classpath, create >> symlinks to the accelerator's jars from hadoop’s libs folder automatically >> and started an Ignite node that uses HDFS as a secondary FS in less than 10 >> minutes. >> >> I just followed the instructions from >> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the >> readme.io <http://readme.io/> look much more complex for me, they don’t >> mention setup-hadoop.sh/bat at all making the end user to perform a >> manual setup. >> >> — >> Denis >> >>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <dsetrak...@apache.org> >> wrote: >>> >>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <dma...@gridgain.com> >> wrote: >>> >>>> Yes, this will be documented tomorrow. I want to go though all the steps >>>> by myself checking all other possible obstacles the user may face with. >>>> >>> >>> Thanks, Denis! >>> >>> >>>> >>>> — >>>> Denis >>>> >>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <dsetrak...@apache.org >>> >>>> wrote: >>>>> >>>>> Ivan, I think this should be documented, no? >>>>> >>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovs...@gridgain.com> >>>> wrote: >>>>> >>>>>> To enable just an IGFS persistence there is no need to use HDFS (this >>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). >>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , >>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the >>>>>> persistence upon local file system, and we already close to the >>>> solution. >>>>>> >>>>>> Regarding the secondary Fs doc page ( >>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would >>>>>> suggest to add the following text there: >>>>>> ------------------------ >>>>>> If Ignite node with secondary file system configured on a machine with >>>>>> Hadoop distribution, make sure Ignite is able to find appropriate >> Hadoop >>>>>> libraries: set HADOOP_HOME environment variable for the Ignite process >>>> if >>>>>> you're using Apache Hadoop distribution, or, if you use another >>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure >> /etc/default/hadoop >>>>>> file exists and has appropriate contents. >>>>>> >>>>>> If Ignite node with secondary file system configured on a machine >>>> without >>>>>> Hadoop distribution, you can manually add necessary Hadoop >> dependencies >>>> to >>>>>> Ignite node classpath: these are dependencies of groupId >>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently >>>> they >>>>>> are: >>>>>> >>>>>> 1. hadoop-annotations >>>>>> 2. hadoop-auth >>>>>> 3. hadoop-common >>>>>> 4. hadoop-hdfs >>>>>> 5. hadoop-mapreduce-client-common >>>>>> 6. hadoop-mapreduce-client-core >>>>>> >>>>>> ------------------------ >>>>>> >>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < >>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>> >>>>>>> Guys, >>>>>>> >>>>>>> Why don't we include ignite-hadoop module in Fabric? This user simply >>>>>> wants >>>>>>> to configure HDFS as a secondary file system to ensure persistence. >> Not >>>>>>> having the opportunity to do this in Fabric looks weird to me. And >>>>>> actually >>>>>>> I don't think this is a use case for Hadoop Accelerator. >>>>>>> >>>>>>> -Val >>>>>>> >>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dma...@gridgain.com> >>>>>> wrote: >>>>>>> >>>>>>>> Hi Ivan, >>>>>>>> >>>>>>>> 1) Yes, I think that it makes sense to have the old versions of the >>>>>> docs >>>>>>>> while an old version is still considered to be used by someone. >>>>>>>> >>>>>>>> 2) Absolutely, the time to add a corresponding article on the >>>>>> readme.io >>>>>>>> has come. It's not the first time I see the question related to HDFS >>>>>> as a >>>>>>>> secondary FS. >>>>>>>> Before and now it's not clear for me what exact steps I should >> follow >>>>>> to >>>>>>>> enable such a configuration. Our current suggestions look like a >>>>>> puzzle. >>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if >>>>>> you >>>>>>>> don't mind I would reaching you out directly asking for any >> technical >>>>>>>> assistance if needed. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Denis >>>>>>>> >>>>>>>> >>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: >>>>>>>> >>>>>>>>> Hi, Valentin, >>>>>>>>> >>>>>>>>> 1) first of all note that the author of the question uses not the >>>>>> latest >>>>>>>>> doc page, namely >>>>>>>>> >>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system >>>>>> . >>>>>>>>> This is version 1.0, while the latest is 1.5: >>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, >> it >>>>>>>>> appeared that some links from the latest doc version point to 1.0 >> doc >>>>>>>>> version. I fixed that in several places where I found that. Do we >>>>>> really >>>>>>>>> need old doc versions (1.0 -1.4)? >>>>>>>>> >>>>>>>>> 2) our documentation ( >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does >>>> not >>>>>>>>> provide any special setup instructions to configure HDFS as >> secondary >>>>>>> file >>>>>>>>> system in Ignite. Our docs assume that if a user wants to integrate >>>>>> with >>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. >>>>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop >> ). >>>>>> It >>>>>>>>> looks like the page >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should >>>> be >>>>>>>>> more >>>>>>>>> clear regarding the required configuration steps (in fact, setting >> up >>>>>>>>> HADOOP_HOME variable for Ignite node process). >>>>>>>>> >>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following >>>>>> conditions >>>>>>>>> are met: >>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" >>>>>> edition). >>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache >>>>>> Hadoop >>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the >>>>>>> Hadoop >>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) >>>>>>>>> >>>>>>>>> The exact mechanism of the Hadoop classpath composition can be >> found >>>>>> in >>>>>>>>> files >>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh >>>>>>>>> IGNITE_HOME/bin/include/setenv.sh . >>>>>>>>> >>>>>>>>> The issue is discussed in >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 >>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . >>>>>>>>> >>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < >>>>>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Igniters, >>>>>>>>>> >>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. >>>>>>>>>> >>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and >> without >>>>>>>>>> Hadoop >>>>>>>>>> JARs, assuming that user will include them from the Hadoop >>>>>> distribution >>>>>>>>>> he >>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop >> to >>>>>>> run >>>>>>>>>> mapreduce jobs, but I can't figure out steps required to configure >>>>>> HDFS >>>>>>>>>> as >>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? >>>> Is >>>>>>>>>> user >>>>>>>>>> supposed to add them manually? >>>>>>>>>> >>>>>>>>>> Can someone with more expertise in our Hadoop integration clarify >>>>>>> this? I >>>>>>>>>> believe there is not enough documentation on this topic. >>>>>>>>>> >>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which is >> in >>>>>>>>>> 'mapred' package? Why map-reduce class is being used? >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>> >> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >>>>>>>>>> >>>>>>>>>> -Val >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >> >>