from:"gtinside"

Spark SQL Optimization

2016-03-21 Thread gtinside

Hi , I am trying to execute a simple query with join on 3 tables. When I look at the execution plan , it varies with position of table in the "from" clause. Execution plan looks more optimized when the position of table with predicates is specified before any other table. Original query : selec

Re: Spark SQL Optimization

2016-03-21 Thread gtinside

More details : Execution plan for Original query select distinct pge.portfolio_code from table1 pge join table2 p on p.perm_group = pge.anc_port_group join table3 uge on p.user_group=uge.anc_user_group where uge.user_name = 'user' and p.perm_type = 'TEST' == Physical Plan == TungstenAggre

Spark schema evolution

2016-03-22 Thread gtinside

Hi , I have a table sourced from* 2 parquet files* with few extra columns in one of the parquet file. Simple * queries works fine but queries with predicate on extra column doesn't work and I get column not found *Column resp_party_type exist in just one parquet file* a) Query that work : select

Group by specific key and save as parquet

2015-08-31 Thread gtinside

Hi , I have a set of data, I need to group by specific key and then save as parquet. Refer to the code snippet below. I am querying trade and then grouping by date val df = sqlContext.sql("SELECT * FROM trade") val dfSchema = df.schema val partitionKeyIndex = dfSchema.fieldNames.seq.indexOf("da

Error while saving parquet

2015-09-22 Thread gtinside

Please refer to the code snippet below . I get following error */tmp/temp/trade.parquet/part-r-00036.parquet is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [20, -28, -93, 93] at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:418) at org

SPARK_LOCAL_DIRS and SPARK_WORKER_DIR

2015-02-11 Thread gtinside

Hi , What is the difference between SPARK_LOCAL_DIRS and SPARK_WORKER_DIR ? Also does spark clean these up after the execution ? Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-LOCAL-DIRS-and-SPARK-WORKER-DIR-tp21612.html Sent from t

Thriftserver & Beeline

2015-02-18 Thread gtinside

Hi , I created some hive tables and trying to list them through Beeline , but not getting any results. I can list the tables through spark-sql. When I connect beeline, it starts up with following message : Connecting to jdbc:hive2://tst001:10001 Enter username for jdbc:hive2://tst001:10001: <> E

NullPointerException in TaskSetManager

2015-02-26 Thread gtinside

Hi , I am trying to run a simple hadoop job (that uses CassandraHadoopInputOutputWriter) on spark (v1.2 , Hadoop v 1.x) but getting NullPointerException in TaskSetManager WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost task 14.2 in stage 0.0 (TID 29, devntom003.dev.black

Integer column in schema RDD from parquet being considered as string

2015-03-04 Thread gtinside

Hi , I am coverting jsonRDD to parquet by saving it as parquet file (saveAsParquetFile) cacheContext.jsonFile("file:///u1/sample.json").saveAsParquetFile("sample.parquet") I am reading parquet file and registering it as a table : val parquet = cacheContext.parquetFile("sample_trades.parquet") par

Re: Integer column in schema RDD from parquet being considered as string

2015-03-06 Thread gtinside

Hi tsingfu , Thanks for your reply, I tried with other columns but the problem is same with other Integer columns. Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Integer-column-in-schema-RDD-from-parquet-being-considered-as-string-tp21917p

Spark SQL 1.3 max operation giving wrong results

2015-03-13 Thread gtinside

Hi , I am playing around with Spark SQL 1.3 and noticed that "max" function does not give the correct result i.e doesn't give the maximum value. The same query works fine in Spark SQL 1.2 . Is any one aware of this issue ? Regards, Gaurav -- View this message in context: http://apache-spark-

flattening a list in spark sql

2014-09-02 Thread gtinside

Hi , I am using jsonRDD in spark sql and having trouble iterating through array inside the json object. Please refer to the schema below : -- Preferences: struct (nullable = true) ||-- destinations: array (nullable = true) |-- user: string (nullable = true) Sample Data: -- Preferences: st

Re: flattening a list in spark sql

2014-09-02 Thread gtinside

Thanks . I am not using hive context . I am loading data from Cassandra and then converting it into json and then querying it through SQL context . Can I use use hive context to query on a jsonRDD ? Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/fla

Spark SQL on Cassandra

2014-09-08 Thread gtinside

Hi , I am reading data from Cassandra through datastax spark-cassandra connector converting it into JSON and then running spark-sql on it. Refer to the code snippet below : step 1 > val o_rdd = sc.cassandraTable[CassandraRDDWrapper]( '', '') step 2 > val tempObjectRDD = sc.parallelize(o_r

Re: flattening a list in spark sql

2014-09-10 Thread gtinside

Hi , Thanks it worked, really appreciate your help. I have also been trying to do multiple Lateral Views, but it doesn't seem to be working. Query : hiveContext.sql("Select t2 from fav LATERAL VIEW explode(TABS) tabs1 as t1 LATERAL VIEW explode(t1) tabs2 as t2") Exception org.apache.spark.sql.c

Re: Cassandra connector

2014-09-10 Thread gtinside

Are you using spark 1.1 ? If yes you would have to update the datastax cassandra connector code and remove ref to log methods from CassandraConnector.scala Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-connector-tp13896p13897.htm

Re: flattening a list in spark sql

2014-09-10 Thread gtinside

My bad, please ignore, it works !!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300p13901.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Spark SQL CLI

2014-09-22 Thread gtinside

Hi , I have been using spark shell to execute all SQLs. I am connecting to Cassandra , converting the data in JSON and then running queries on it, I am using HiveContext (and not SQLContext) because of "explode " functionality in it. I want to see how can I use Spark SQL CLI for directly runnin

Spark SQL on XML files

2014-10-18 Thread gtinside

Hi , I have bunch of Xml files and I want to run spark SQL on it, is there a recommended approach ? I am thinking of either converting Xml in json and then jsonRDD Please let me know your thoughts Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nab

Integrating Spark with other applications

2014-11-07 Thread gtinside

Hi , I have been working on Spark SQL and want to expose this functionality to other applications. Idea is to let other applications to send sql to be executed on spark cluster and get the result back. I looked at spark job server (https://github.com/ooyala/spark-jobserver) but it provides a RESTf

unread block data when reading from NFS

2014-12-13 Thread gtinside

Hi , I am trying to read a csv file in the following way : val csvData = sc.textFile("file:///tmp/sample.csv") csvData.collect().length This works file on spark-shell but when I try to do spark-submit of the jar, I get the following exceptions : java.lang.IllegalStateException: unread block data

Data Locality

2015-01-06 Thread gtinside

Does spark guarantee to push the processing to the data ? Before creating tasks does spark always check for data location ? So for example if I have 3 spark nodes (Node1, Node2, Node3) and data is local to just 2 nodes (Node1 and Node2) , will spark always schedule tasks on the node for which the d

Re: Web Service + Spark

2015-01-09 Thread gtinside

You can also look at Spark Job Server https://github.com/spark-jobserver/spark-jobserver - Gaurav > On Jan 9, 2015, at 10:25 PM, Corey Nolet wrote: > > Cui Lin, > > The solution largely depends on how you want your services deployed (Java web > container, Spray framework, etc...) and if you

Spark SQL Optimization

Re: Spark SQL Optimization

Spark schema evolution

Group by specific key and save as parquet

Error while saving parquet

SPARK_LOCAL_DIRS and SPARK_WORKER_DIR

Thriftserver & Beeline

NullPointerException in TaskSetManager

Integer column in schema RDD from parquet being considered as string

Re: Integer column in schema RDD from parquet being considered as string

Spark SQL 1.3 max operation giving wrong results

flattening a list in spark sql

Re: flattening a list in spark sql

Spark SQL on Cassandra

Re: flattening a list in spark sql

Re: Cassandra connector

Re: flattening a list in spark sql

Spark SQL CLI

Spark SQL on XML files

Integrating Spark with other applications

unread block data when reading from NFS

Data Locality

Re: Web Service + Spark

23 matches

Site Navigation

Mail list logo

Footer information