Hi
I wrote some Python code to do calculation on spark stream. The code works fine for about half an hour then the memory usage for the executor become very high. I assign 4GB in the submit command but it using 80% of my physical memory which is 16GB. I see this from top command. In this situation the code just hang there.. You may say the workload is too big so have memory issue. But it is not. My stream interval is 30 seconds. The workload is one source that generating a file with 10 000 lines every 10 seconds. So in one batch interval it is 30 000 lines of csv file. Only a few kb. So it can not be the workload. The cluster I use is spark stand alone cluster on only one node. The submit command I use is *./bin/spark-submit --master spark://ES01:7077 --executor-memory 4G --num-executors 1 --total-executor-cores 1 ./latest5min.py 1>a.log 2>b.log* The code is all in the file latest5min.py. The logic is very simple and the file contains less than 100 lines. I will attache the file here.. latest5min.py <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26904/latest5min.py> I know it is not happy experience to ready other peoples code .. I will try to reduce my code to see where is the problem. But every time I need to wait half an hour or longer to hit the error. So it will take some time. Please help to check the current code first if possible. I will be very happy to answer any question Very appreciate for the help. This is a really headache problem. Totally no clue what is happening -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-I-have-memory-leaking-for-such-simple-spark-stream-code-tp26904.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org