I have significant amount of data stored on my Hadoop HDFS as Parquet files
I am using Spark streaming to interactively receive queries from a web
server and transform the received queries into SQL to run on my data using
SparkSQL.

In this process I need to run several SQL queries and then return some
aggregate result by merging or subtracting the results of individual
queries.

Are there any ways I could optimize and increase the speed of the process
by, for example, running queries on already received dataframes rather than
the whole database?

Is there a better way to interactively query the Parquet stored data and
give results?

Thank you!



Narek Galstyan

Նարեկ Գալստյան

Reply via email to