Re: Thoughts on Spark 3 release, or a preview release

2019-09-15 Thread Wenchen Fan
I don't expect to see a large DS V2 API change from now on. But we may update the API a little bit if we find problems during the preview. On Sat, Sep 14, 2019 at 10:16 PM Sean Owen wrote: > I don't think this suggests anything is finalized, including APIs. I > would not guess there will be majo

Re: Ask for ARM CI for spark

2019-09-15 Thread Tianhua huang
@Sean Owen , so sorry to reply late, we had a Mid-Autumn holiday:) If you hope to integrate ARM CI to amplab jenkins, we can offer the arm instance, and then the ARM job will run together with other x86 jobs, so maybe there is a guideline to do this? @shane knapp would you help us? On Thu, Sep

Re: [DISCUSS][SPIP][SPARK-29031] Materialized columns

2019-09-15 Thread Wenchen Fan
> 1. It is a waste of IO. The whole column (in Map format) should be read and Spark extract the required keys from the map, even though the query requires only one or a few keys in the map This sounds like a similar use case to nested column pruning. We should push down the map key extracting to t

Documentation on org.apache.spark.sql.functions backend.

2019-09-15 Thread Vipul Rajan
I am trying to create a function that reads data from Kafka, communicates with confluent schema registry and decodes avro data with evolving schemas. I am trying to not create hack-ish patches and to write proper code that I could maybe even create pull requests for. looking at the code I have been