Re: Data source V2 in spark 2.4.0

2018-10-04 Thread Ryan Blue
Assaf, thanks for the feedback. The InternalRow issue is one we know about. If it helps, I wrote up some docs for InternalRow as part of SPAR

Re: Data source V2 in spark 2.4.0

2018-10-04 Thread assaf.mendelson
Thanks for the info. I have been converting an internal data source to V2 and am now preparing it for 2.4.0. I have a couple of suggestions from my experience so far. First I believe we are missing documentation on this. I am currently writing an internal tutorial based on what I am learning, I

Re: Data source V2 in spark 2.4.0

2018-10-01 Thread Wenchen Fan
Ryan thanks for putting up a list! Generally there are a few tunning to the data source v2 API in 2.4, and it shouldn't be too hard if you already have a data source v2 implementation and you want to upgrade to Spark 2.4. However, we do want to do some big API changes for data source v2 in the ne

Re: Data source V2 in spark 2.4.0

2018-10-01 Thread Ryan Blue
Hi Assaf, The major changes to the V2 API that you linked to aren’t going into 2.4. Those will be in the next release because they weren’t finished in time for 2.4. Here are the major updates that will be in 2.4: - SPARK-23323 : The output

Data source V2 in spark 2.4.0

2018-10-01 Thread assaf.mendelson
Hi all, I understood from previous threads that the Data source V2 API will see some changes in spark 2.4.0, however, I can't seem to find what these changes are. Is there some documentation which summarizes the changes? The only mention I seem to find is this pull request: https://github.com/apa