Good idea, thanks! But unfortunately that's not possible. All containers are connected to an overlay network.
Is there any other possiblity to say spark that it is on the same *NODE* as an hdfs data node? On 28.12.2016 12:00, Miguel Morales wrote: > It might have to do with your container ips, it depends on your > networking setup. You might want to try host networking so that the > containers share the ip with the host. > > On Wed, Dec 28, 2016 at 1:46 AM, Karamba <phantom...@web.de> wrote: >> Hi Sun Rui, >> >> thanks for answering! >> >> >>> Although the Spark task scheduler is aware of rack-level data locality, it >>> seems that only YARN implements the support for it. >> This explains why the script that I configured in core-site.xml >> topology.script.file.name is not called in by the spark container. >> But at time of reading from hdfs in a spark program, the script is >> called in my hdfs namenode container. >> >>> However, node-level locality can still work for Standalone. >> I have a couple of physical hosts that run spark and hdfs docker >> containers. How does spark standalone knows that spark and docker >> containers are on the same host? >> >>> Data Locality involves in both task data locality and executor data >>> locality. Executor data locality is only supported on YARN with executor >>> dynamic allocation enabled. For standalone, by default, a Spark application >>> will acquire all available cores in the cluster, generally meaning there is >>> at least one executor on each node, in which case task data locality can >>> work because a task can be dispatched to an executor on any of the >>> preferred nodes of the task for execution. >>> >>> for your case, have you set spark.cores.max to limit the cores to acquire, >>> which means executors are available on a subset of the cluster nodes? >> I set "--total-executor-cores 1" in order to use only a small subset of >> the cluster. >> >> >> >> On 28.12.2016 02:58, Sun Rui wrote: >>> Although the Spark task scheduler is aware of rack-level data locality, it >>> seems that only YARN implements the support for it. However, node-level >>> locality can still work for Standalone. >>> >>> It is not necessary to copy the hadoop config files into the Spark CONF >>> directory. Set HADOOP_CONF_DIR to point to the conf directory of your >>> Hadoop. >>> >>> Data Locality involves in both task data locality and executor data >>> locality. Executor data locality is only supported on YARN with executor >>> dynamic allocation enabled. For standalone, by default, a Spark application >>> will acquire all available cores in the cluster, generally meaning there is >>> at least one executor on each node, in which case task data locality can >>> work because a task can be dispatched to an executor on any of the >>> preferred nodes of the task for execution. >>> >>> for your case, have you set spark.cores.max to limit the cores to acquire, >>> which means executors are available on a subset of the cluster nodes? >>> >>>> On Dec 27, 2016, at 01:39, Karamba <phantom...@web.de> wrote: >>>> >>>> Hi, >>>> >>>> I am running a couple of docker hosts, each with an HDFS and a spark >>>> worker in a spark standalone cluster. >>>> In order to get data locality awareness, I would like to configure Racks >>>> for each host, so that a spark worker container knows from which hdfs >>>> node container it should load its data. Does this make sense? >>>> >>>> I configured HDFS container nodes via the core-site.xml in >>>> $HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my >>>> setup. >>>> >>>> I configured SPARK the same way. I placed core-site.xml and >>>> hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect. >>>> >>>> Submitting a spark job via spark-submit to the spark-master that loads >>>> from HDFS just has Data locality ANY. >>>> >>>> It would be great if anybody would help me getting the right configuration! >>>> >>>> Thanks and best regards, >>>> on >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org