Hi 

I wrote some Python code to do calculation on spark stream. The code works
fine for about half an hour then the memory usage for the executor become
very high. I assign 4GB in the submit command but it using 80% of my
physical memory which is 16GB. I see this from top command.  In this
situation the code just hang there..


You may say the workload is too big so have memory issue. But it is not. My
stream interval is 30 seconds. The workload is one source that generating a
file with 10 000 lines every 10 seconds. So in one batch interval it is 30
000 lines of csv file. Only a few kb. So it can not be the workload.


The cluster I use is spark stand alone cluster on only one node.


The submit command I use is 


*./bin/spark-submit   --master spark://ES01:7077 --executor-memory 4G
--num-executors 1 --total-executor-cores 1 ./latest5min.py      1>a.log 2>b.log*


The code is all in the file latest5min.py.  The logic is very simple and the
file contains less than 100 lines.


I will attache the file here..  latest5min.py
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26904/latest5min.py> 
 


I know it is not happy experience to ready other peoples code .. I will try
to reduce my code to see where is the problem.  But every time I need to
wait half an hour or longer to hit the error. So it will take some time.


Please help to check the current code first if possible. I will be very
happy to answer any question


Very appreciate for the help. This is a really headache problem. Totally no
clue what is happening







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-I-have-memory-leaking-for-such-simple-spark-stream-code-tp26904.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to