Running SparkSql against Hive tables

2015-06-05 Thread James Pirz
I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 'Spark SQL' to run some SQL scripts, on the cluster. I realized that for a better performance, it is a good idea to use Parquet files. I have 2 questions regarding that: 1) If I wanna use Spark SQL against *partitioned & bucketed

Re: Running SparkSql against Hive tables

2015-06-08 Thread James Pirz
o) - Any suggestion or hint on how I can do that would be highly appreciated. Thnx On Sun, Jun 7, 2015 at 6:39 AM, Cheng Lian wrote: > > > On 6/6/15 9:06 AM, James Pirz wrote: > > I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 'Spark > SQL' t

Re: Running SparkSql against Hive tables

2015-06-09 Thread James Pirz
t; > What you currently doing is using beeline to connect to hive, which should > work even without spark. > > Best > Ayan > > On Tue, Jun 9, 2015 at 10:42 AM, James Pirz wrote: > >> Thanks for the help! >> I am actually trying Spark SQL to run queries against table

Re: Running SparkSql against Hive tables

2015-06-09 Thread James Pirz
ing a query file with -f flag). Looking at the Spark SQL documentation, it seems that it is possible. Please correct me if I am wrong. On Mon, Jun 8, 2015 at 6:56 PM, Cheng Lian wrote: > > On 6/9/15 8:42 AM, James Pirz wrote: > > Thanks for the help! > I am actually trying Spark SQL to

spark-submit does not use hive-site.xml

2015-06-09 Thread James Pirz
I am using Spark (standalone) to run queries (from a remote client) against data in tables that are already defined/loaded in Hive. I have started metastore service in Hive successfully, and by putting hive-site.xml, with proper metastore.uri, in $SPARK_HOME/conf directory, I tried to share its co

Re: spark-submit does not use hive-site.xml

2015-06-10 Thread James Pirz
to communicate with Hive metastore. > > So your program need to instantiate a > `org.apache.spark.sql.hive.HiveContext` instead. > > Cheng > > > On 6/10/15 10:19 AM, James Pirz wrote: > > I am using Spark (standalone) to run queries (from a remote client) > against d

Setting executors per worker - Standalone

2015-09-28 Thread James Pirz
Hi, I am using speak 1.5 (standalone mode) on a cluster with 10 nodes while each machine has 12GB of RAM and 4 cores. On each machine I have one worker which is running one executor that grabs all 4 cores. I am interested to check the performance with "one worker but 4 executors per machine - each

Re: Setting executors per worker - Standalone

2015-09-28 Thread James Pirz
have 4 cores per worker > > > > On Tue, Sep 29, 2015 at 8:24 AM, James Pirz wrote: > >> Hi, >> >> I am using speak 1.5 (standalone mode) on a cluster with 10 nodes while >> each machine has 12GB of RAM and 4 cores. On each machine I have one worker >&g

Re: Setting executors per worker - Standalone

2015-09-29 Thread James Pirz
28, 2015 at 8:46 PM, Jeff Zhang wrote: > >> use "--executor-cores 1" you will get 4 executors per worker since you >> have 4 cores per worker >> >> >> >> On Tue, Sep 29, 2015 at 8:24 AM, James Pirz wrote: >> >>> Hi, >>> >

worker and executor memory

2015-08-13 Thread James Pirz
Hi, I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines, for a workload similar to TPCH (analytical queries with multiple/multi-way large joins and aggregations). Each machine has 12GB of Memory and 4 cores. My total data size is 150GB, stored in HDFS (stored as Hive tables), a

Re: worker and executor memory

2015-08-14 Thread James Pirz
scheduled that way, as it is a map-only job and reading can happen in parallel. On Thu, Aug 13, 2015 at 9:10 PM, James Pirz wrote: > Hi, > > I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines, > for a workload similar to TPCH (analytical queries with multiple/multi

Repartitioning external table in Spark sql

2015-08-18 Thread James Pirz
I am using Spark 1.4.1 , in stand-alone mode, on a cluster of 3 nodes. Using Spark sql and Hive Context, I am trying to run a simple scan query on an existing Hive table (which is an external table consisting of rows in text files stored in HDFS - it is NOT parquet, ORC or any other richer format)

[Spark SQL] dependencies to use test helpers

2019-07-24 Thread James Pirz
I have a Scala application in which I have added some extra rules to Catalyst. While adding some unit tests, I am trying to use some existing functions from Catalyst's test code: Specifically comparePlans() and normalizePlan() under PlanTestBase