using UDF( defined in Java) in scala through scala

2015-09-29 Thread ogoh
Hello, I have a udf declared in Java but I'd like to call it from spark-shell which only supports Scala. Since I am new to Scala, I couldn't figure out how to call register the Java UDF using sqlContext.udf.register in scala. Below is how I tried. I appreciate any help. Thanks, = my UDF in jav

SparkSQL 1.4 can't accept registration of UDF?

2015-07-14 Thread ogoh
Hello, I am using SparkSQL along with ThriftServer so that we can access using Hive queries. With Spark 1.3.1, I can register UDF function. But, Spark 1.4.0 doesn't work for that. The jar of the udf is same. Below is logs: I appreciate any advice. == With Spark 1.4 Beeline version 1.4.0 by Apache

Re: Error when connecting to Spark SQL via Hive JDBC driver

2015-06-18 Thread ogoh
hello, I am not sure what is wrong.. But, in my case, I followed the instruction from http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCDriver.html. It worked fine with SQuirreL SQL Client (http://squirrel-sql.sourceforge.net/), and SQL Workbench J (http://www.sql-workbenc

SparkSQL : using Hive UDF returning Map throws "rror: scala.MatchError: interface java.util.Map (of class java.lang.Class) (state=,code=0)"

2015-06-04 Thread ogoh
Hello, I tested some custom udf on SparkSql's ThriftServer & Beeline (Spark 1.3.1). Some udfs work fine (access array parameter and returning int or string type). But my udf returning map type throws an error: "Error: scala.MatchError: interface java.util.Map (of class java.lang.Class) (state=,co

SparkSQL's performance gets degraded depending on number of partitions of Hive tables..is it normal?

2015-06-01 Thread ogoh
} 2015-05-25 16:37:44 DEBUG Client:424 - The ping interval is 6 ms. 2015-05-25 16:37:44 DEBUG Client:693 - Connecting to /10.128.193.211:9000 2015-05-25 16:37:44 DEBUG Client:1007 - IPC Client (2100771791) connection to /10.128.193.211:9000 from ogoh sending #151 2015-05-25 16:37:44 DEBUG

Re: Spark 1.3.0 -> 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

2015-05-30 Thread ogoh
I had the same issue on AWS EMR with Spark 1.3.1.e (AWS version) passed with '-h' parameter (it is bootstrap action parameter for spark). I don't see the problem with Spark 1.3.1.e not passing the parameter. I am not sure about your env. Thanks, -- View this message in context: http://apache-s

SparkSQL's performance : contacting namenode and datanode to uncessarily check all partitions for a query of specific partitions

2015-05-25 Thread ogoh
5-25 16:37:44 DEBUG Client:1007 - IPC Client (2100771791) connection to /10.128.193.211:9000 from ogoh sending #151 2015-05-25 16:37:44 DEBUG Client:944 - IPC Client (2100771791) connection to /10.128.193.211:9000 from ogoh: starting, having connections 2 2015-05-25 16:37:44 DEBUG Client:1064 -

SparkSQL can't read S3 path for hive external table

2015-05-23 Thread ogoh
Hello, I am using Spark1.3 in AWS. SparkSQL can't recognize Hive external table on S3. The following is the error message. I appreciate any help. Thanks, Okehee -- 15/05/24 01:02:18 ERROR thriftserver.SparkSQLDriver: Failed in [select count(*) from api_search where pdate='2015-05-08'] java

SparkSQL failing while writing into S3 for 'insert into table'

2015-05-22 Thread ogoh
Hello, I am using spark 1.3 & Hive 0.13.1 in AWS. >From Spark-SQL, when running Hive query to export Hive query result into AWS S3, it failed with the following message: == org.apache.hadoop.hive.ql.metadata.HiveException: checkPaths: s3://test-dev/tmp/hive-hadoop/hive_2015-05-23_00-33-06_943_459

beeline that comes with spark 1.3.0 doesn't work with "--hiveconf" or ''--hivevar" which substitutes variables at hive scripts.

2015-04-22 Thread ogoh
Hello, I am using Spark 1.3 for SparkSQL (hive) & ThriftServer & Beeline. The Beeline doesn't work with "--hiveconf" or ''--hivevar" which substitutes variables at hive scripts. I found the following jiras saying that Hive 0.13 resolved that issue. I wonder if this is well-known issue? https://i

Generating a schema in Spark 1.3 failed while using DataTypes.

2015-04-02 Thread ogoh
Hello, My ETL uses sparksql to generate parquet files which are served through Thriftserver using hive ql. It especially defines a schema programmatically since the schema can be only known at runtime. With spark 1.2.1, it worked fine (followed https://spark.apache.org/docs/latest/sql-programming

SparkSQL supports hive "insert overwrite directory"?

2015-03-06 Thread ogoh
TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME temptable TOK_INSERT TOK_DESTINATION TOK_DIR '/user/ogoh/table' TOK_SELECT TOK_SELEXPR TOK_ALLCOLREF scala.NotImplementedError: No parse rules for: TOK_DESTINATION TOK_DIR '/user/

Hive on Spark vs. SparkSQL using Hive ?

2015-01-28 Thread ogoh
ge of SparkSQL (http://spark.apache.org/docs/latest/sql-programming-guide.html)? Also, is there any update about SparkSQL's next release (current one is still alpha)? Thanks, OGoh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-vs-SparkSQ