Vinod KC created SPARK-51685:
--------------------------------

             Summary: Excessive Info logging from RocksDb operations casing too 
big executor stderr files
                 Key: SPARK-51685
                 URL: https://issues.apache.org/jira/browse/SPARK-51685
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 4.1.0
            Reporter: Vinod KC


Long-running structured streaming applications with RocksDb statestore is 
failing after some time due to "No space left on device" error.
Checked the executor logs and noticed that the volume of executor logs is 
crazy. Around 1GB of data for 1 minute of run. 
Each info log entry were printing more than 4000 lines (list for files in 
RockDB operations)

Eg:
{code:java}
25/03/19 10:38:32 INFO RocksDBFileManager 
[StateStoreId(opId=0,partId=734,name=default), 
queryRunId=30d232ab-0d2a-454d-a217-5b2debfe10e9, 
queryId=4c8ca990-9fd4-4b69-8f90-ce933b3f4ef3]: Saving checkpoint files for 
version 285264 - 2912 files 
/local_disk0/spark-fe801ff3-f023-4554-8237-8914d37c1485/executor-41cadbd8-1863-4fc9-986d-607fa1f6d797/spark-7990e75a-1c8e-48a6-b59f-b174342e7837/[StateStoreId(opId=0,partId=734,name=default),
 queryRunId=30d232ab-0d2a-454d-a217-5b2debfe10e9, 
queryId=4c8ca990-9fd4-4b69-8f90-ce933b3f4ef3]-506ec0a7-fd75-4a75-9b09-717a64ee3478/checkpoint-42a1f8e8-a467-4c6a-9581-401d352a46bf/005040.log
 - 3135 bytes 
/local_disk0/spark-fe801ff3-f023-4554-8237-8914d37c1485/executor-41cadbd8-1863-4fc9-986d-607fa1f6d797/spark-7990e75a-1c8e-48a6-b59f-b174342e7837/[StateStoreId(opId=0,partId=734,name=default),
 queryRunId=30d232ab-0d2a-454d-a217-5b2debfe10e9, 
queryId=4c8ca990-9fd4-4b69-8f90-ce933b3f4ef3]-506ec0a7-fd75-4a75-9b09-717a64ee3478/checkpoint-42a1f8e8-a467-4c6a-9581-401d352a46bf/013865.log
 - 1368 bytes  
 ....{code}
Here Info logs printing all checkpoint file paths. These verbos info logs 
affects streaming application disk space as most users keep Info as the default 
log level.
So it is better to use Debug log to log such details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to