Re: Using HDFS as a secondary FS

Denis Magda Mon, 14 Dec 2015 07:28:51 -0800

Yes, this will be documented tomorrow. I want to go though all the steps by 
myself checking all other possible obstacles the user may face with.


—
Denis

> On 14 дек. 2015 г., at 18:11, Dmitriy Setrakyan <dsetrak...@apache.org> wrote:
> 
> Ivan, I think this should be documented, no?
> 
> On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovs...@gridgain.com> wrote:
> 
>> To enable just an IGFS persistence there is no need to use HDFS (this
>> requires Hadoop dependency, requires configured HDFS cluster, etc.).
>> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
>> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
>> persistence upon local file system, and we already close to  the solution.
>> 
>> Regarding the secondary Fs doc page (
>> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
>> suggest to add the following text there:
>> ------------------------
>> If Ignite node with secondary file system configured on a machine with
>> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
>> libraries: set HADOOP_HOME environment variable for the Ignite process if
>> you're using Apache Hadoop distribution, or, if you use another
>> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
>> file exists and has appropriate contents.
>> 
>> If Ignite node with secondary file system configured on a machine without
>> Hadoop distribution, you can manually add necessary Hadoop dependencies to
>> Ignite node classpath: these are dependencies of groupId
>> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they
>> are:
>> 
>>   1. hadoop-annotations
>>   2. hadoop-auth
>>   3. hadoop-common
>>   4. hadoop-hdfs
>>   5. hadoop-mapreduce-client-common
>>   6. hadoop-mapreduce-client-core
>> 
>> ------------------------
>> 
>> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>> 
>>> Guys,
>>> 
>>> Why don't we include ignite-hadoop module in Fabric? This user simply
>> wants
>>> to configure HDFS as a secondary file system to ensure persistence. Not
>>> having the opportunity to do this in Fabric looks weird to me. And
>> actually
>>> I don't think this is a use case for Hadoop Accelerator.
>>> 
>>> -Val
>>> 
>>> On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dma...@gridgain.com>
>> wrote:
>>> 
>>>> Hi Ivan,
>>>> 
>>>> 1) Yes, I think that it makes sense to have the old versions of the
>> docs
>>>> while an old version is still considered to be used by someone.
>>>> 
>>>> 2) Absolutely, the time to add a corresponding article on the
>> readme.io
>>>> has come. It's not the first time I see the question related to HDFS
>> as a
>>>> secondary FS.
>>>> Before and now it's not clear for me what exact steps I should follow
>> to
>>>> enable such a configuration. Our current suggestions look like a
>> puzzle.
>>>> I'll assemble the puzzle on my side and prepare the article. Ivan if
>> you
>>>> don't mind I would reaching you out directly asking for any technical
>>>> assistance if needed.
>>>> 
>>>> Regards,
>>>> Denis
>>>> 
>>>> 
>>>> On 12/14/2015 10:25 AM, Ivan V. wrote:
>>>> 
>>>>> Hi, Valentin,
>>>>> 
>>>>> 1) first of all note that the author of the question uses not the
>> latest
>>>>> doc page, namely
>>>>> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
>> .
>>>>> This is version 1.0, while the latest is 1.5:
>>>>> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
>>>>> appeared that some links from the latest doc version point to 1.0 doc
>>>>> version. I fixed that in several places where I found that. Do we
>> really
>>>>> need old doc versions (1.0 -1.4)?
>>>>> 
>>>>> 2) our documentation (
>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
>>>>> provide any special setup instructions to configure HDFS as secondary
>>> file
>>>>> system in Ignite. Our docs assume that if a user wants to integrate
>> with
>>>>> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
>>>>> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
>> It
>>>>> looks like the page
>>>>> http://apacheignite.gridgain.org/docs/secondary-file-system should be
>>>>> more
>>>>> clear regarding the required configuration steps (in fact, setting up
>>>>> HADOOP_HOME variable for Ignite node process).
>>>>> 
>>>>> 3) Hadoop jars are correctly found by Ignite if the following
>> conditions
>>>>> are met:
>>>>> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
>> edition).
>>>>> (b) Either HADOOP_HOME environment variable is set up (for Apache
>> Hadoop
>>>>> distribution), or file "/etc/default/hadoop" exists and matches the
>>> Hadoop
>>>>> distribution used (BigTop, Cloudera, HDP, etc.)
>>>>> 
>>>>> The exact mechanism of the Hadoop classpath composition can be found
>> in
>>>>> files
>>>>> IGNITE_HOME/bin/include/hadoop-classpath.sh
>>>>> IGNITE_HOME/bin/include/setenv.sh .
>>>>> 
>>>>> The issue is discussed in
>>>>> https://issues.apache.org/jira/browse/IGNITE-372
>>>>> , https://issues.apache.org/jira/browse/IGNITE-483 .
>>>>> 
>>>>> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
>>>>> valentin.kuliche...@gmail.com> wrote:
>>>>> 
>>>>> Igniters,
>>>>>> 
>>>>>> I'm looking at the question on SO [1] and I'm a bit confused.
>>>>>> 
>>>>>> We ship ignite-hadoop module only in Hadoop Accelerator and without
>>>>>> Hadoop
>>>>>> JARs, assuming that user will include them from the Hadoop
>> distribution
>>>>>> he
>>>>>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
>>> run
>>>>>> mapreduce jobs, but I can't figure out steps required to configure
>> HDFS
>>>>>> as
>>>>>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
>>>>>> user
>>>>>> supposed to add them manually?
>>>>>> 
>>>>>> Can someone with more expertise in our Hadoop integration clarify
>>> this? I
>>>>>> believe there is not enough documentation on this topic.
>>>>>> 
>>>>>> BTW, any ideas why user gets exception for JobConf class which is in
>>>>>> 'mapred' package? Why map-reduce class is being used?
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>>> 
>>>>>> 
>>> 
>> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>

Re: Using HDFS as a secondary FS

Reply via email to