Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-19 Thread Ryan Blue
I think this is expected behavior, though not what I think is reasonable in the long term. To my knowledge, this is how the v1 sources behave, and v2 just reuses the same mechanism to instantiate sources and uses a new interface for v2 features. I think that the right approach is to use catalogs,

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Liu
Hi Peter, Thanks for the additional information - this is really helpful (I definitively got more than I was looking for :-) Cheers, Peter On Fri, Oct 19, 2018 at 12:53 PM Peter Rudenko wrote: > Hi Peter, we're using a part of Crail - it's core library, called disni ( > https://github.com/zr

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
Hi Peter, we're using a part of Crail - it's core library, called disni ( https://github.com/zrlio/disni/). We couldn't reproduce results from that blog post, any case Crail is more platformic approach (it comes with it's own file system), while SparkRdma is a pluggable approach - it's just a plugi

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Liu
Hi Peter, thank you for the reply and detailed information! Would this something comparable with Crail? ( http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html) I was more looking for something simple/quick making the shuffle between the local jvms quicker (like the idea of using local

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
Hey Peter, in SparkRDMA shuffle plugin ( https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file, to do Remote Direct Memory Access. If the shuffle data is bigger then RAM, Mellanox NIC support On Demand Paging, where OS invalidates translations which are no longer valid due to eith

[Spark for kubernetes] Azure Blob Storage credentials issue

2018-10-19 Thread Oscar Bonilla
Hello, I'm having the following issue while trying to run Spark for kubernetes : 2018-10-18 08:48:54 INFO DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s Exception in thread "main" org.apache.spark.SparkE