Sending on behalf of a colleague whose mail isn’t reaching the dev list for some reason 😊
======================================================================================================================================================= HI Spark developers, If I want to hint spark to use particular list of hosts to execute tasks on. I see that getBlockLocations is used to get the list of hosts from HDFS. https://github.com/apache/spark/blob/7955b3962ac46b89564e0613db7bea98a1478bf2/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L386<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2F7955b3962ac46b89564e0613db7bea98a1478bf2%2Fsql%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2Fsql%2Fexecution%2FDataSourceScanExec.scala%23L386&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7C587a6770d4724df9a4cd08d85a9746f4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637358953057626070&sdata=UnzUHibfGMKzV5HyLJiqCaknfXwgAFtrfYhoxXU7Io4%3D&reserved=0> Hinting Spark by custom getBlockLocation which return Array of BlockLocations with host ip address doesn’t help, Spark continues to host it on other executors hosts. Is there something I am doing wrong ? Test: Spark.read.csv() Appreciate your inputs 😊 Thanks, Nasrulla