Hi Filip, Care to share the code behind "The only thing I found so far involves using forEachBatch and manually updating my aggregates. "?
I'm not completely sure I understand your use case and hope the code could shed more light on it. Thank you. Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Thu, Jan 21, 2021 at 5:05 PM Filip <filip.necul...@enghouse.com.invalid> wrote: > Hi, > > I'm considering using Apache Spark for the development of an application. > This would replace a legacy program which reads CSV files and does lots > (tens/hundreds) of aggregations on them. The aggregations are fairly > simple: > counts, sums, etc. while applying some filtering conditions on some of the > columns. > > I prefer using structured streaming for its simplicity and low-latency. I'd > also like to use full SQL queries (via createOrReplaceTempView). However, > doing multiple queries means Spark will re-read the input files for each > one > of them. This seems very inefficient for my use-case. > > Does anyone have any suggestions? The only thing I found so far involves > using forEachBatch and manually updating my aggregates. But, I think there > should be a simpler solution for this use case. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >