Bobby Wang created SPARK-50168: ---------------------------------- Summary: Connect session is not released if not calling spark.stop() explicitly Key: SPARK-50168 URL: https://issues.apache.org/jira/browse/SPARK-50168 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Bobby Wang
Hi, I found that the Spark Connect session will not be released if not calling spark.stop() explicitly. h2. *Repro:* I have a python file with below code test.py {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote("sc://localhost").getOrCreate() spark.range(10).show(){code} After executing it by {code:java} python test.py{code} I found the corresponding connect session is still alive from spark webui. See this session id 96260131-a22c-4342-92df-8dc7ace5d1de item in below table But if I have `spark.stop() been called explicitly in the python file {code:java} spark = SparkSession.builder.remote("sc://localhost").getOrCreate() spark.range(10).show() spark.stop(){code} The connect session will be released. See the 4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 item ||[User|http://localhost:4040/connect/?&sessionstat.sort=User&sessionstat.pageSize=100#sessionstat]||[Session ID|http://localhost:4040/connect/?&sessionstat.sort=Session+ID&sessionstat.pageSize=100#sessionstat]||[Start Time ▾|http://localhost:4040/connect/?&sessionstat.sort=Start+Time&sessionstat.desc=false&sessionstat.pageSize=100#sessionstat]||[Finish Time|http://localhost:4040/connect/?&sessionstat.sort=Finish+Time&sessionstat.pageSize=100#sessionstat]||[Duration|http://localhost:4040/connect/?&sessionstat.sort=Duration&sessionstat.pageSize=100#sessionstat]||[Total Execute|http://localhost:4040/connect/?&sessionstat.sort=Total+Execute&sessionstat.pageSize=100#sessionstat]|| |xxx|[4e8ffb4e-7684-4fa7-b750-814f9a23f2d0|http://localhost:4040/connect/session/?id=4e8ffb4e-7684-4fa7-b750-814f9a23f2d0]|2024/10/30 11:05:25|2024/10/30 11:05:25|78 ms|1| |xxx|[96260131-a22c-4342-92df-8dc7ace5d1de|http://localhost:4040/connect/session/?id=96260131-a22c-4342-92df-8dc7ace5d1de]|2024/10/30 11:04:41| |13 minutes 55 seconds|0| So I'm wondering if this is per-design or a potential bug? Since if the connect session is not released, the connect server will still hold the caches which will not be freed. That could blow up the connect server/driver memory. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org