Follow up on Jira Issue 39549

2022-06-24 Thread Chenyang Zhang
Hi, Thanks so much for the response from https://issues.apache.org/jira/browse/SPARK-39549. I am curious what do you mean by write down to a table and read it from a different Spark Application. Do you mean a table in a database or the Spark to_table() api? Could I read the table created in di

Re: Follow up on Jira Issue 39549

2022-06-24 Thread Sean Owen
Spark is decoupled from storage. You can write data to any storage you like. Anything that can read that data, can read that data - Spark or not, different session or not. Temp views are specific to a session and do not store data. I think this is trivial and no problem at all, or else I'm not clea

Re: Follow up on Jira Issue 39549

2022-06-24 Thread Chenyang Zhang
Thanks for response. I want to figure out is there a way to share data without writing it to disk because of performance issues. Chenyang Zhang Software Engineering Intern, Platform Redwood City, California [cid:EnterpriseAI_Banner_1200_e6f8b810-93f3-44f1-b795-bb502b7a52ae.png]

Re: Follow up on Jira Issue 39549

2022-06-24 Thread Sean Owen
No, programs can't read information out of other processes's memory - this is true of all software. A cached DataFrame is tied to a Spark application, but many things could be running inside that app. You may be thinking of something like, exposing some kind of service on the driver that responds t

Spark Doubts

2022-06-24 Thread Sid
Hi Team, I have various doubts as below: 1) Can I apply predicate pushdown filters if I have data stored in S3 or it should be used only while reading from DBs? 2) While running the data in distributed form, is my code copied to each and every executor. As per me, it should be the case since cod