Hi, Denis, 1) my opinion is that we'd better not mention 'setup-hadoop' script at all (for the reasons mentioned above) and delete it in the nearest release. 2) Now Ignite is a part of BigTop distribution (see https://issues.apache.org/jira/browse/IGNITE-665), so the old BigTop instruction is not relevant any more. I guess, this is the reason.
On Tue, Dec 15, 2015 at 12:35 PM, Denis Magda <dma...@gridgain.com> wrote: > Hi Ivan, > > Thanks for clarification. > > Actually I’ve modified the content of the following pages: > > - Added “Atomatic Hadoop Configuration” section that describes the usage > of setup-hadoop with all its pros and cons for Apache Hadoop and CDH > > http://apacheignite.gridgain.org/v1.5/docs/installing-on-apache-hadoop#automatic-hadoop-configuration > http://apacheignite.gridgain.org/docs/installing-on-cloudera-cdh > > - Provided more info on how to use ‘HDFS’ as a secondary file system for > ‘IGFS’ using your yesterday answer and referring to the updated > configuration guides > http://apacheignite.gridgain.org/docs/secondary-file-system > > Please as an IGFS & Hadoop expert review my changes and edit them whenever > required. > > In addition I noted that we have a disabled and empty article for BigTop > distribution. Is this OK? > > — > Denis > > > On 15 дек. 2015 г., at 12:10, Ivan V. <iveselovs...@gridgain.com> wrote: > > > > Denis, good question. > > Yes, there are several reasons. > > 1) setup-hadoop is suitable for Apache Hadoop distribution, but not for > all > > others (e.g. BigTop) > > 2) setup-hadoop rewrites global configs (core-site.xml, mapred-site.xml), > > what prevents further cluster usage without Ignite. > > 3) setup-hadoop needs write permission to all the folders it writes files > > to. > > 4) It is possible to provide all the required functionality without any > > file modifications in the existing Hadoop cluster at all, see > > https://issues.apache.org/jira/browse/IGNITE-483. > > > > There were plans to remove "setup-hadoop", but that is not yet done. > > In any way, I 100% agree that presence of several different versions of > the > > documentation is quite confusing and misleading. > > > > > > On Mon, Dec 14, 2015 at 10:58 PM, Denis Magda <dma...@gridgain.com> > wrote: > > > >> Ivan, > >> > >> Is there any reason why we don’t recommend using > >> apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop > >> Accelerator articles? > >> > >> With setup-hadoop.sh I was able to build a valid classpath, create > >> symlinks to the accelerator's jars from hadoop’s libs folder > automatically > >> and started an Ignite node that uses HDFS as a secondary FS in less > than 10 > >> minutes. > >> > >> I just followed the instructions from > >> apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the > >> readme.io <http://readme.io/> look much more complex for me, they don’t > >> mention setup-hadoop.sh/bat at all making the end user to perform a > >> manual setup. > >> > >> — > >> Denis > >> > >>> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <dsetrak...@apache.org > > > >> wrote: > >>> > >>> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <dma...@gridgain.com> > >> wrote: > >>> > >>>> Yes, this will be documented tomorrow. I want to go though all the > steps > >>>> by myself checking all other possible obstacles the user may face > with. > >>>> > >>> > >>> Thanks, Denis! > >>> > >>> > >>>> > >>>> — > >>>> Denis > >>>> > >>>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan < > dsetrak...@apache.org > >>> > >>>> wrote: > >>>>> > >>>>> Ivan, I think this should be documented, no? > >>>>> > >>>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovs...@gridgain.com> > >>>> wrote: > >>>>> > >>>>>> To enable just an IGFS persistence there is no need to use HDFS > (this > >>>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). > >>>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 > , > >>>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the > >>>>>> persistence upon local file system, and we already close to the > >>>> solution. > >>>>>> > >>>>>> Regarding the secondary Fs doc page ( > >>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I > would > >>>>>> suggest to add the following text there: > >>>>>> ------------------------ > >>>>>> If Ignite node with secondary file system configured on a machine > with > >>>>>> Hadoop distribution, make sure Ignite is able to find appropriate > >> Hadoop > >>>>>> libraries: set HADOOP_HOME environment variable for the Ignite > process > >>>> if > >>>>>> you're using Apache Hadoop distribution, or, if you use another > >>>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure > >> /etc/default/hadoop > >>>>>> file exists and has appropriate contents. > >>>>>> > >>>>>> If Ignite node with secondary file system configured on a machine > >>>> without > >>>>>> Hadoop distribution, you can manually add necessary Hadoop > >> dependencies > >>>> to > >>>>>> Ignite node classpath: these are dependencies of groupId > >>>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . > Currently > >>>> they > >>>>>> are: > >>>>>> > >>>>>> 1. hadoop-annotations > >>>>>> 2. hadoop-auth > >>>>>> 3. hadoop-common > >>>>>> 4. hadoop-hdfs > >>>>>> 5. hadoop-mapreduce-client-common > >>>>>> 6. hadoop-mapreduce-client-core > >>>>>> > >>>>>> ------------------------ > >>>>>> > >>>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < > >>>>>> valentin.kuliche...@gmail.com> wrote: > >>>>>> > >>>>>>> Guys, > >>>>>>> > >>>>>>> Why don't we include ignite-hadoop module in Fabric? This user > simply > >>>>>> wants > >>>>>>> to configure HDFS as a secondary file system to ensure persistence. > >> Not > >>>>>>> having the opportunity to do this in Fabric looks weird to me. And > >>>>>> actually > >>>>>>> I don't think this is a use case for Hadoop Accelerator. > >>>>>>> > >>>>>>> -Val > >>>>>>> > >>>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dma...@gridgain.com > > > >>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Ivan, > >>>>>>>> > >>>>>>>> 1) Yes, I think that it makes sense to have the old versions of > the > >>>>>> docs > >>>>>>>> while an old version is still considered to be used by someone. > >>>>>>>> > >>>>>>>> 2) Absolutely, the time to add a corresponding article on the > >>>>>> readme.io > >>>>>>>> has come. It's not the first time I see the question related to > HDFS > >>>>>> as a > >>>>>>>> secondary FS. > >>>>>>>> Before and now it's not clear for me what exact steps I should > >> follow > >>>>>> to > >>>>>>>> enable such a configuration. Our current suggestions look like a > >>>>>> puzzle. > >>>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan > if > >>>>>> you > >>>>>>>> don't mind I would reaching you out directly asking for any > >> technical > >>>>>>>> assistance if needed. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Denis > >>>>>>>> > >>>>>>>> > >>>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: > >>>>>>>> > >>>>>>>>> Hi, Valentin, > >>>>>>>>> > >>>>>>>>> 1) first of all note that the author of the question uses not the > >>>>>> latest > >>>>>>>>> doc page, namely > >>>>>>>>> > >>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system > >>>>>> . > >>>>>>>>> This is version 1.0, while the latest is 1.5: > >>>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, > >> it > >>>>>>>>> appeared that some links from the latest doc version point to 1.0 > >> doc > >>>>>>>>> version. I fixed that in several places where I found that. Do we > >>>>>> really > >>>>>>>>> need old doc versions (1.0 -1.4)? > >>>>>>>>> > >>>>>>>>> 2) our documentation ( > >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) > does > >>>> not > >>>>>>>>> provide any special setup instructions to configure HDFS as > >> secondary > >>>>>>> file > >>>>>>>>> system in Ignite. Our docs assume that if a user wants to > integrate > >>>>>> with > >>>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction > (e.g. > >>>>>>>>> > http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop > >> ). > >>>>>> It > >>>>>>>>> looks like the page > >>>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system > should > >>>> be > >>>>>>>>> more > >>>>>>>>> clear regarding the required configuration steps (in fact, > setting > >> up > >>>>>>>>> HADOOP_HOME variable for Ignite node process). > >>>>>>>>> > >>>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following > >>>>>> conditions > >>>>>>>>> are met: > >>>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" > >>>>>> edition). > >>>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache > >>>>>> Hadoop > >>>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches > the > >>>>>>> Hadoop > >>>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) > >>>>>>>>> > >>>>>>>>> The exact mechanism of the Hadoop classpath composition can be > >> found > >>>>>> in > >>>>>>>>> files > >>>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh > >>>>>>>>> IGNITE_HOME/bin/include/setenv.sh . > >>>>>>>>> > >>>>>>>>> The issue is discussed in > >>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 > >>>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . > >>>>>>>>> > >>>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < > >>>>>>>>> valentin.kuliche...@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Igniters, > >>>>>>>>>> > >>>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. > >>>>>>>>>> > >>>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and > >> without > >>>>>>>>>> Hadoop > >>>>>>>>>> JARs, assuming that user will include them from the Hadoop > >>>>>> distribution > >>>>>>>>>> he > >>>>>>>>>> uses. It seems OK for me when accelerator is plugged in to > Hadoop > >> to > >>>>>>> run > >>>>>>>>>> mapreduce jobs, but I can't figure out steps required to > configure > >>>>>> HDFS > >>>>>>>>>> as > >>>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on > classpath? > >>>> Is > >>>>>>>>>> user > >>>>>>>>>> supposed to add them manually? > >>>>>>>>>> > >>>>>>>>>> Can someone with more expertise in our Hadoop integration > clarify > >>>>>>> this? I > >>>>>>>>>> believe there is not enough documentation on this topic. > >>>>>>>>>> > >>>>>>>>>> BTW, any ideas why user gets exception for JobConf class which > is > >> in > >>>>>>>>>> 'mapred' package? Why map-reduce class is being used? > >>>>>>>>>> > >>>>>>>>>> [1] > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem > >>>>>>>>>> > >>>>>>>>>> -Val > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >