Hi,
My goal is to ~
(1) either chain streaming aggregations in a single query OR
(2) run multiple streaming aggregations and save data in some meaningful
format to execute low latency / failsafe OLAP queries
So my first choice is parquet format , but I failed to make it work !
I am using spark-streaming_2.11-2.1.1
I am facing the following error -
org.apache.spark.sql.AnalysisException: Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets;
- for the following syntax
StreamingQuery streamingQry = tagBasicAgg.writeStream()
.format("parquet")
.trigger(ProcessingTime.create("10 seconds"))
.queryName("tagAggSummary")
.outputMode("append")
.option("checkpointLocation", "/tmp/summary/checkpoints/")
.option("path", "/data/summary/tags/")
.start();
But, parquet doesn't support 'complete' outputMode.
So is parquet supported only for batch queries , NOT for streaming queries
?
- note that console outputmode working fine !
Any help will be much appreciated.
Thanks
Kaniska