There are several step by step guides that you can find online by googling https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-thrift-server.html https://medium.com/@saipeddy/setting-up-a-thrift-server-4eb0c55c11f0 https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.3/bk_spark-component-guide/content/config-sts.html
Have you tried any of those? Where are you getting stuck? On 2/18/21, 2:44 PM, "Scott Ribe" <scott_r...@elevated-dev.com> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. I need a little help figuring out how some pieces fit together. I have some tables in parquet files, and I want to access them using SQL over JDBC. I gather that I need to run the thrift server, but how do I configure it to load my files into datasets and expose views? The context is this: trying to figure out if we want to use Spark for historical data, and so far, just using spark shell for some experiments: - I have established that we can easily export to Parquet and it is very efficient at storing this data - Spark SQL queries the data with reasonable performance Now I am at the step of testing whether the client-side that we are considering can deal effectively with querying the volume of data. Which is why I'm looking for the simplest setup. If the client integration works, then yes we move on to configuring a proper cluster. (And it is a real question, I've already had one potential client-side piece be totally incompetent at handling a decent volume of data...) (The environment I am working in is just the straight download of spark-3.0.1-bin-hadoop3.2) -- Scott Ribe scott_r...@elevated-dev.com https://www.linkedin.com/in/scottribe/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org