Barre metal servers with 2 dedicated clusters (spark and Cassandra) versus 1 cluster with colocation. In both case 10 gbps dedicated network.
Le sam. 14 avr. 2018 à 23:17, Mich Talebzadeh <mich.talebza...@gmail.com> a écrit : > Thanks Vincent. You mean 20 times improvement with data being local as > opposed to Spark running on compute nodes? > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 14 April 2018 at 21:06, vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> Not with hadoop but with Cassandra, i have seen 20x data locality >> improvement on partitioned optimized spark jobs >> >> Le sam. 14 avr. 2018 à 21:17, Mich Talebzadeh <mich.talebza...@gmail.com> >> a écrit : >> >>> Hi, >>> >>> This is a sort of your mileage varies type question. >>> >>> In a classic Hadoop cluster, one has data locality when each node >>> includes the Spark libraries and HDFS data. this helps certain queries like >>> interactive BI. >>> >>> However running Spark over remote storage say Isilon scaled out NAS >>> instead of LOCAL HDFS becomes problematic. The full-scan Spark needs to >>> do will take much longer when it is done over the network (access the >>> remote Isilon storage) instead of local I/O request to HDFS. >>> >>> Has anyone done some comparative studies on this? >>> >>> >>> Thanks >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >> >