Bobby Wang created SPARK-50168:
----------------------------------

             Summary: Connect session is not released if not calling 
spark.stop() explicitly
                 Key: SPARK-50168
                 URL: https://issues.apache.org/jira/browse/SPARK-50168
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 4.0.0
            Reporter: Bobby Wang


Hi,

I found that the Spark Connect session will not be released if not calling 
spark.stop() explicitly.
h2. *Repro:*

I have a python file with below code 

test.py
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
spark.range(10).show(){code}
After executing it by 
{code:java}
python test.py{code}
I found the corresponding connect session is still alive from spark webui. See 
this session id 96260131-a22c-4342-92df-8dc7ace5d1de item in below table

But if I have `spark.stop() been called explicitly in the python file
{code:java}
spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
spark.range(10).show()
spark.stop(){code}
The connect session will be released. See the 
4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 item

 
||[User|http://localhost:4040/connect/?&sessionstat.sort=User&sessionstat.pageSize=100#sessionstat]||[Session
 
ID|http://localhost:4040/connect/?&sessionstat.sort=Session+ID&sessionstat.pageSize=100#sessionstat]||[Start
 Time 
▾|http://localhost:4040/connect/?&sessionstat.sort=Start+Time&sessionstat.desc=false&sessionstat.pageSize=100#sessionstat]||[Finish
 
Time|http://localhost:4040/connect/?&sessionstat.sort=Finish+Time&sessionstat.pageSize=100#sessionstat]||[Duration|http://localhost:4040/connect/?&sessionstat.sort=Duration&sessionstat.pageSize=100#sessionstat]||[Total
 
Execute|http://localhost:4040/connect/?&sessionstat.sort=Total+Execute&sessionstat.pageSize=100#sessionstat]||
|xxx|[4e8ffb4e-7684-4fa7-b750-814f9a23f2d0|http://localhost:4040/connect/session/?id=4e8ffb4e-7684-4fa7-b750-814f9a23f2d0]|2024/10/30
 11:05:25|2024/10/30 11:05:25|78 ms|1|
|xxx|[96260131-a22c-4342-92df-8dc7ace5d1de|http://localhost:4040/connect/session/?id=96260131-a22c-4342-92df-8dc7ace5d1de]|2024/10/30
 11:04:41| |13 minutes 55 seconds|0|


So I'm wondering if this is per-design or a potential bug? Since if the connect 
session is not released, the connect server will
still hold the caches which will not be freed. That could blow up the connect 
server/driver memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to