Re: Pushdown in DataSourceV2 question

2018-12-09 Thread Alessandro Solimando
Hello, that's an interesting question, but after Frank's reply I am a bit puzzled. If there is no control over the pushdown status how can Spark guarantee the correctness of the final query? Consider a filter pushed down to the data source, either Spark has to know if it has been applied or not,

Re: Pushdown in DataSourceV2 question

2018-12-09 Thread Jörn Franke
Well even if it has to apply it again, if pushdown is activated then it will be much less cost for spark to see if the filter has been applied or not. Applying the filter is negligible, what it really avoids if the file format implements it is IO cost (for reading) as well as cost for converting

Re: Pushdown in DataSourceV2 question

2018-12-09 Thread Wenchen Fan
expressions/functions can be expensive and I do think Spark should trust data source and not re-apply pushed filters. If data source lies, many things can go wrong... On Sun, Dec 9, 2018 at 8:17 PM Jörn Franke wrote: > Well even if it has to apply it again, if pushdown is activated then it > wil

Re: Pushdown in DataSourceV2 question

2018-12-09 Thread Jörn Franke
It is not about lying or not or trust or not. Some or all filters may not be supported by a data source. Some might only be applied under certain environmental conditions (eg enough memory etc). It is much more expensive to communicate between Spark and a data source which filters have been ap

Why not setup a Gitter chatroom for Spark contributors

2018-12-09 Thread Darcy Shen
Gitter is cool and convenient. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Why not setup a Gitter chatroom for Spark contributors

2018-12-09 Thread Sean Owen
I think this has come up before, and the issue is really that it adds yet another channel for people to follow to get 100% of the discussion about the project. I don't believe the project would bless an official chat channel, but, anyone can run an unofficial one of course. On Sun, Dec 9, 2018 at 3