Ivan, Is there any reason why we don’t recommend using apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop Accelerator articles?
With setup-hadoop.sh I was able to build a valid classpath, create symlinks to the accelerator's jars from hadoop’s libs folder automatically and started an Ignite node that uses HDFS as a secondary FS in less than 10 minutes. I just followed the instructions from apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the readme.io <http://readme.io/> look much more complex for me, they don’t mention setup-hadoop.sh/bat at all making the end user to perform a manual setup. — Denis > On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > > On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <dma...@gridgain.com> wrote: > >> Yes, this will be documented tomorrow. I want to go though all the steps >> by myself checking all other possible obstacles the user may face with. >> > > Thanks, Denis! > > >> >> — >> Denis >> >>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <dsetrak...@apache.org> >> wrote: >>> >>> Ivan, I think this should be documented, no? >>> >>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovs...@gridgain.com> >> wrote: >>> >>>> To enable just an IGFS persistence there is no need to use HDFS (this >>>> requires Hadoop dependency, requires configured HDFS cluster, etc.). >>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 , >>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the >>>> persistence upon local file system, and we already close to the >> solution. >>>> >>>> Regarding the secondary Fs doc page ( >>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would >>>> suggest to add the following text there: >>>> ------------------------ >>>> If Ignite node with secondary file system configured on a machine with >>>> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop >>>> libraries: set HADOOP_HOME environment variable for the Ignite process >> if >>>> you're using Apache Hadoop distribution, or, if you use another >>>> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop >>>> file exists and has appropriate contents. >>>> >>>> If Ignite node with secondary file system configured on a machine >> without >>>> Hadoop distribution, you can manually add necessary Hadoop dependencies >> to >>>> Ignite node classpath: these are dependencies of groupId >>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently >> they >>>> are: >>>> >>>> 1. hadoop-annotations >>>> 2. hadoop-auth >>>> 3. hadoop-common >>>> 4. hadoop-hdfs >>>> 5. hadoop-mapreduce-client-common >>>> 6. hadoop-mapreduce-client-core >>>> >>>> ------------------------ >>>> >>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko < >>>> valentin.kuliche...@gmail.com> wrote: >>>> >>>>> Guys, >>>>> >>>>> Why don't we include ignite-hadoop module in Fabric? This user simply >>>> wants >>>>> to configure HDFS as a secondary file system to ensure persistence. Not >>>>> having the opportunity to do this in Fabric looks weird to me. And >>>> actually >>>>> I don't think this is a use case for Hadoop Accelerator. >>>>> >>>>> -Val >>>>> >>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dma...@gridgain.com> >>>> wrote: >>>>> >>>>>> Hi Ivan, >>>>>> >>>>>> 1) Yes, I think that it makes sense to have the old versions of the >>>> docs >>>>>> while an old version is still considered to be used by someone. >>>>>> >>>>>> 2) Absolutely, the time to add a corresponding article on the >>>> readme.io >>>>>> has come. It's not the first time I see the question related to HDFS >>>> as a >>>>>> secondary FS. >>>>>> Before and now it's not clear for me what exact steps I should follow >>>> to >>>>>> enable such a configuration. Our current suggestions look like a >>>> puzzle. >>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if >>>> you >>>>>> don't mind I would reaching you out directly asking for any technical >>>>>> assistance if needed. >>>>>> >>>>>> Regards, >>>>>> Denis >>>>>> >>>>>> >>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote: >>>>>> >>>>>>> Hi, Valentin, >>>>>>> >>>>>>> 1) first of all note that the author of the question uses not the >>>> latest >>>>>>> doc page, namely >>>>>>> >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system >>>> . >>>>>>> This is version 1.0, while the latest is 1.5: >>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it >>>>>>> appeared that some links from the latest doc version point to 1.0 doc >>>>>>> version. I fixed that in several places where I found that. Do we >>>> really >>>>>>> need old doc versions (1.0 -1.4)? >>>>>>> >>>>>>> 2) our documentation ( >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does >> not >>>>>>> provide any special setup instructions to configure HDFS as secondary >>>>> file >>>>>>> system in Ignite. Our docs assume that if a user wants to integrate >>>> with >>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g. >>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). >>>> It >>>>>>> looks like the page >>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should >> be >>>>>>> more >>>>>>> clear regarding the required configuration steps (in fact, setting up >>>>>>> HADOOP_HOME variable for Ignite node process). >>>>>>> >>>>>>> 3) Hadoop jars are correctly found by Ignite if the following >>>> conditions >>>>>>> are met: >>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric" >>>> edition). >>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache >>>> Hadoop >>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the >>>>> Hadoop >>>>>>> distribution used (BigTop, Cloudera, HDP, etc.) >>>>>>> >>>>>>> The exact mechanism of the Hadoop classpath composition can be found >>>> in >>>>>>> files >>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh >>>>>>> IGNITE_HOME/bin/include/setenv.sh . >>>>>>> >>>>>>> The issue is discussed in >>>>>>> https://issues.apache.org/jira/browse/IGNITE-372 >>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 . >>>>>>> >>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko < >>>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>>> >>>>>>> Igniters, >>>>>>>> >>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused. >>>>>>>> >>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without >>>>>>>> Hadoop >>>>>>>> JARs, assuming that user will include them from the Hadoop >>>> distribution >>>>>>>> he >>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to >>>>> run >>>>>>>> mapreduce jobs, but I can't figure out steps required to configure >>>> HDFS >>>>>>>> as >>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? >> Is >>>>>>>> user >>>>>>>> supposed to add them manually? >>>>>>>> >>>>>>>> Can someone with more expertise in our Hadoop integration clarify >>>>> this? I >>>>>>>> believe there is not enough documentation on this topic. >>>>>>>> >>>>>>>> BTW, any ideas why user gets exception for JobConf class which is in >>>>>>>> 'mapred' package? Why map-reduce class is being used? >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>> >> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem >>>>>>>> >>>>>>>> -Val >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >> >>