There are several step by step guides that you can find online by googling

https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-thrift-server.html
https://medium.com/@saipeddy/setting-up-a-thrift-server-4eb0c55c11f0
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.3/bk_spark-component-guide/content/config-sts.html

Have you tried any of those? Where are you getting stuck?


On 2/18/21, 2:44 PM, "Scott Ribe" <scott_r...@elevated-dev.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    I need a little help figuring out how some pieces fit together. I have some 
tables in parquet files, and I want to access them using SQL over JDBC. I 
gather that I need to run the thrift server, but how do I configure it to load 
my files into datasets and expose views?

    The context is this: trying to figure out if we want to use Spark for 
historical data, and so far, just using spark shell for some experiments:

    - I have established that we can easily export to Parquet and it is very 
efficient at storing this data
    - Spark SQL queries the data with reasonable performance

    Now I am at the step of testing whether the client-side that we are 
considering can deal effectively with querying the volume of data.

    Which is why I'm looking for the simplest setup. If the client integration 
works, then yes we move on to configuring a proper cluster. (And it is a real 
question, I've already had one potential client-side piece be totally 
incompetent at handling a decent volume of data...)

    (The environment I am working in is just the straight download of 
spark-3.0.1-bin-hadoop3.2)

    --
    Scott Ribe
    scott_r...@elevated-dev.com
    https://www.linkedin.com/in/scottribe/




    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Reply via email to