Re: Using HDFS as a secondary FS

Denis Magda Mon, 14 Dec 2015 11:59:37 -0800

Ivan,

Is there any reason why we don’t recommend using 
apache-ignite-hadoop-{version}/bin/setup-hadoop.sh/bat in our Hadooop 
Accelerator articles?


With setup-hadoop.sh I was able to build a valid classpath, create symlinks to 
the accelerator's jars from hadoop’s libs folder automatically and started an 
Ignite node that uses HDFS as a secondary FS in less than 10 minutes.

I just followed the instructions from 
apache-ignite-hadoop-{version}/HADOOP_README.txt. Instructions from the 
readme.io <http://readme.io/> look much more complex for me, they don’t mention 
setup-hadoop.sh/bat at all making the end user to perform a manual setup.

—
Denis

> On 14 дек. 2015 г., at 20:24, Dmitriy Setrakyan <[email protected]> wrote:
> 
> On Mon, Dec 14, 2015 at 7:28 AM, Denis Magda <[email protected]> wrote:
> 
>> Yes, this will be documented tomorrow. I want to go though all the steps
>> by myself checking all other possible obstacles the user may face with.
>> 
> 
> Thanks, Denis!
> 
> 
>> 
>> —
>> Denis
>> 
>>> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <[email protected]>
>> wrote:
>>> 
>>> Ivan, I think this should be documented, no?
>>> 
>>> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <[email protected]>
>> wrote:
>>> 
>>>> To enable just an IGFS persistence there is no need to use HDFS (this
>>>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
>>>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
>>>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
>>>> persistence upon local file system, and we already close to  the
>> solution.
>>>> 
>>>> Regarding the secondary Fs doc page (
>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
>>>> suggest to add the following text there:
>>>> ------------------------
>>>> If Ignite node with secondary file system configured on a machine with
>>>> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
>>>> libraries: set HADOOP_HOME environment variable for the Ignite process
>> if
>>>> you're using Apache Hadoop distribution, or, if you use another
>>>> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
>>>> file exists and has appropriate contents.
>>>> 
>>>> If Ignite node with secondary file system configured on a machine
>> without
>>>> Hadoop distribution, you can manually add necessary Hadoop dependencies
>> to
>>>> Ignite node classpath: these are dependencies of groupId
>>>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently
>> they
>>>> are:
>>>> 
>>>>  1. hadoop-annotations
>>>>  2. hadoop-auth
>>>>  3. hadoop-common
>>>>  4. hadoop-hdfs
>>>>  5. hadoop-mapreduce-client-common
>>>>  6. hadoop-mapreduce-client-core
>>>> 
>>>> ------------------------
>>>> 
>>>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
>>>> [email protected]> wrote:
>>>> 
>>>>> Guys,
>>>>> 
>>>>> Why don't we include ignite-hadoop module in Fabric? This user simply
>>>> wants
>>>>> to configure HDFS as a secondary file system to ensure persistence. Not
>>>>> having the opportunity to do this in Fabric looks weird to me. And
>>>> actually
>>>>> I don't think this is a use case for Hadoop Accelerator.
>>>>> 
>>>>> -Val
>>>>> 
>>>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> Hi Ivan,
>>>>>> 
>>>>>> 1) Yes, I think that it makes sense to have the old versions of the
>>>> docs
>>>>>> while an old version is still considered to be used by someone.
>>>>>> 
>>>>>> 2) Absolutely, the time to add a corresponding article on the
>>>> readme.io
>>>>>> has come. It's not the first time I see the question related to HDFS
>>>> as a
>>>>>> secondary FS.
>>>>>> Before and now it's not clear for me what exact steps I should follow
>>>> to
>>>>>> enable such a configuration. Our current suggestions look like a
>>>> puzzle.
>>>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
>>>> you
>>>>>> don't mind I would reaching you out directly asking for any technical
>>>>>> assistance if needed.
>>>>>> 
>>>>>> Regards,
>>>>>> Denis
>>>>>> 
>>>>>> 
>>>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
>>>>>> 
>>>>>>> Hi, Valentin,
>>>>>>> 
>>>>>>> 1) first of all note that the author of the question uses not the
>>>> latest
>>>>>>> doc page, namely
>>>>>>> 
>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
>>>> .
>>>>>>> This is version 1.0, while the latest is 1.5:
>>>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
>>>>>>> appeared that some links from the latest doc version point to 1.0 doc
>>>>>>> version. I fixed that in several places where I found that. Do we
>>>> really
>>>>>>> need old doc versions (1.0 -1.4)?
>>>>>>> 
>>>>>>> 2) our documentation (
>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does
>> not
>>>>>>> provide any special setup instructions to configure HDFS as secondary
>>>>> file
>>>>>>> system in Ignite. Our docs assume that if a user wants to integrate
>>>> with
>>>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>>>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
>>>> It
>>>>>>> looks like the page
>>>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should
>> be
>>>>>>> more
>>>>>>> clear regarding the required configuration steps (in fact, setting up
>>>>>>> HADOOP_HOME variable for Ignite node process).
>>>>>>> 
>>>>>>> 3) Hadoop jars are correctly found by Ignite if the following
>>>> conditions
>>>>>>> are met:
>>>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
>>>> edition).
>>>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
>>>> Hadoop
>>>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
>>>>> Hadoop
>>>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
>>>>>>> 
>>>>>>> The exact mechanism of the Hadoop classpath composition can be found
>>>> in
>>>>>>> files
>>>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>>>>>>> IGNITE_HOME/bin/include/setenv.sh .
>>>>>>> 
>>>>>>> The issue is discussed in
>>>>>>> https://issues.apache.org/jira/browse/IGNITE-372
>>>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>>>>>> 
>>>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>> Igniters,
>>>>>>>> 
>>>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>>>>>> 
>>>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
>>>>>>>> Hadoop
>>>>>>>> JARs, assuming that user will include them from the Hadoop
>>>> distribution
>>>>>>>> he
>>>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
>>>>> run
>>>>>>>> mapreduce jobs, but I can't figure out steps required to configure
>>>> HDFS
>>>>>>>> as
>>>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath?
>> Is
>>>>>>>> user
>>>>>>>> supposed to add them manually?
>>>>>>>> 
>>>>>>>> Can someone with more expertise in our Hadoop integration clarify
>>>>> this? I
>>>>>>>> believe there is not enough documentation on this topic.
>>>>>>>> 
>>>>>>>> BTW, any ideas why user gets exception for JobConf class which is in
>>>>>>>> 'mapred' package? Why map-reduce class is being used?
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>>>>>> 
>>>>>>>> -Val
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Using HDFS as a secondary FS

Reply via email to