Re: Structured Streaming Parquet Sink

2016-07-31 Thread Tathagata Das
Yes, files do not support complete mode output yet. We are working on that, and should be available in Spark 2.1. In the meantime, you can use aggregation with memory sink (i.e. format("memory")) to store in a in-memory table, which then can be periodically written to a parquet table explicitly. N

Re: Structured Streaming Parquet Sink

2016-07-30 Thread Jacek Laskowski
Hi Arun, Regarding parquet and complete output mode: A relevant piece of the code to think about: if (outputMode != OutputMode.Append) { throw new IllegalArgumentException( s"Data source $className does not support $outputMode output mode") } https://github

Re: Structured Streaming Parquet Sink

2016-07-30 Thread Arun Patel
Thanks for the response. However, I am not able to use any output mode. In case of Parquet sink, there should not be any aggregations? scala> val query = streamingCountsDF.writeStream.format("parquet").option("path","parq").option("checkpointLocation","chkpnt").outputMode("complete").start() ja

Re: Structured Streaming Parquet Sink

2016-07-30 Thread Tathagata Das
Correction, the two options are. - writeStream.format("parquet").option("path", "...").start() - writestream.parquet("...").start() There no start with param. On Jul 30, 2016 11:22 AM, "Jacek Laskowski" wrote: > Hi Arun, > > > As per documentation, parquet is the only available file sink. > >

Re: Structured Streaming Parquet Sink

2016-07-30 Thread Jacek Laskowski
Hi Arun, > As per documentation, parquet is the only available file sink. The following sinks are currently available in Spark: * ConsoleSink for console format. * FileStreamSink for parquet format. * ForeachSink used in foreach operator. * MemorySink for memory format. You can create your own