Good idea, thanks!

But unfortunately that's not possible. All containers are connected to
an overlay network.

Is there any other possiblity to say spark that it is on the same *NODE*
as an hdfs data node?


On 28.12.2016 12:00, Miguel Morales wrote:
> It might have to do with your container ips, it depends on your
> networking setup.  You might want to try host networking so that the
> containers share the ip with the host.
>
> On Wed, Dec 28, 2016 at 1:46 AM, Karamba <phantom...@web.de> wrote:
>> Hi Sun Rui,
>>
>> thanks for answering!
>>
>>
>>> Although the Spark task scheduler is aware of rack-level data locality, it 
>>> seems that only YARN implements the support for it.
>> This explains why the script that I configured in core-site.xml
>> topology.script.file.name is not called in by the spark container.
>> But at time of reading from hdfs in a spark program, the script is
>> called in my hdfs namenode container.
>>
>>> However, node-level locality can still work for Standalone.
>> I have a couple of physical hosts that run spark and hdfs docker
>> containers. How does spark standalone knows that spark and docker
>> containers are on the same host?
>>
>>> Data Locality involves in both task data locality and executor data 
>>> locality. Executor data locality is only supported on YARN with executor 
>>> dynamic allocation enabled. For standalone, by default, a Spark application 
>>> will acquire all available cores in the cluster, generally meaning there is 
>>> at least one executor on each node, in which case task data locality can 
>>> work because a task can be dispatched to an executor on any of the 
>>> preferred nodes of the task for execution.
>>>
>>> for your case, have you set spark.cores.max to limit the cores to acquire, 
>>> which means executors are available on a subset of the cluster nodes?
>> I set "--total-executor-cores 1" in order to use only a small subset of
>> the cluster.
>>
>>
>>
>> On 28.12.2016 02:58, Sun Rui wrote:
>>> Although the Spark task scheduler is aware of rack-level data locality, it 
>>> seems that only YARN implements the support for it. However, node-level 
>>> locality can still work for Standalone.
>>>
>>> It is not necessary to copy the hadoop config files into the Spark CONF 
>>> directory. Set HADOOP_CONF_DIR to point to the conf directory of your 
>>> Hadoop.
>>>
>>> Data Locality involves in both task data locality and executor data 
>>> locality. Executor data locality is only supported on YARN with executor 
>>> dynamic allocation enabled. For standalone, by default, a Spark application 
>>> will acquire all available cores in the cluster, generally meaning there is 
>>> at least one executor on each node, in which case task data locality can 
>>> work because a task can be dispatched to an executor on any of the 
>>> preferred nodes of the task for execution.
>>>
>>> for your case, have you set spark.cores.max to limit the cores to acquire, 
>>> which means executors are available on a subset of the cluster nodes?
>>>
>>>> On Dec 27, 2016, at 01:39, Karamba <phantom...@web.de> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am running a couple of docker hosts, each with an HDFS and a spark
>>>> worker in a spark standalone cluster.
>>>> In order to get data locality awareness, I would like to configure Racks
>>>> for each host, so that a spark worker container knows from which hdfs
>>>> node container it should load its data. Does this make sense?
>>>>
>>>> I configured HDFS container nodes via the core-site.xml in
>>>> $HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my
>>>> setup.
>>>>
>>>> I configured SPARK the same way. I placed core-site.xml and
>>>> hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect.
>>>>
>>>> Submitting a spark job via spark-submit to the spark-master that loads
>>>> from HDFS just has Data locality ANY.
>>>>
>>>> It would be great if anybody would help me getting the right configuration!
>>>>
>>>> Thanks and best regards,
>>>> on
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to