Re: Which OutputCommitter to use for S3?

2015-03-25 Thread Pei-Lun Lee
I updated the PR for SPARK-6352 to be more like SPARK-3595. I added a new setting "spark.sql.parquet.output.committer.class" in hadoop configuration to allow custom implementation of ParquetOutputCommitter. Can someone take a look at the PR? On Mon, Mar 16, 2015 at 5:23 PM, Pei-Lun

Re: SparkSQL 1.3.0 JDBC data source issues

2015-03-19 Thread Pei-Lun Lee
JIRA and PR for first issue: https://issues.apache.org/jira/browse/SPARK-6408 https://github.com/apache/spark/pull/5087 On Thu, Mar 19, 2015 at 12:20 PM, Pei-Lun Lee wrote: > Hi, > > I am trying jdbc data source in spark sql 1.3.0 and found some issues. > > First, the syntax

SparkSQL 1.3.0 JDBC data source issues

2015-03-18 Thread Pei-Lun Lee
Hi, I am trying jdbc data source in spark sql 1.3.0 and found some issues. First, the syntax "where str_col='value'" will give error for both postgresql and mysql: psql> create table foo(id int primary key,name text,age int); bash> SPARK_CLASSPATH=postgresql-9.4-1201-jdbc41.jar spark/bin/spark-s

Re: SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Pei-Lun Lee
gt; path contain an actual comma in it. In your case, you may do something like > this: > > val s3nDF = parquetFile("s3n > ://... > ")val hdfsDF = parquetFile("hdfs://...")val finalDF = s3nDF.union(finalDF) > > Cheng > > On 3/16/15 4:03 PM, Pei-Lun Lee wr

Re: Which OutputCommitter to use for S3?

2015-03-16 Thread Pei-Lun Lee
ect dependency makes this injection much more > difficult for saveAsParquetFile. > > On Thu, Mar 5, 2015 at 12:28 AM, Pei-Lun Lee wrote: > >> Thanks for the DirectOutputCommitter example. >> However I found it only works for saveAsHadoopFile. What about >> saveAsParquetFile? &

SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Pei-Lun Lee
Hi, I am using Spark 1.3.0, where I cannot load parquet files from more than one file system, say one s3n://... and another hdfs://..., which worked in older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. One way to fix this is instead of get a single FileSystem from defaul

Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-15 Thread Pei-Lun Lee
on > work > >> when we wanna to upgrade Spark 1.3. > >> > >> Is there anyone can help me? > >> > >> > >> Thanks > >> > >> Wisely Chen > >> > >> > >> On Tue, Mar 10, 2015 at 5:06

SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-10 Thread Pei-Lun Lee
Hi, I found that if I try to read parquet file generated by spark 1.1.1 using 1.3.0-rc3 by default settings, I got this error: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'StructType': was expecting ('true', 'false' or 'null') at [Source: StructType(List(StructField(a,Integ

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Pei-Lun Lee
Thanks for the DirectOutputCommitter example. However I found it only works for saveAsHadoopFile. What about saveAsParquetFile? It looks like SparkSQL is using ParquetOutputCommitter, which is subclass of FileOutputCommitter. On Fri, Feb 27, 2015 at 1:52 AM, Thomas Demoor wrote: > FYI. We're cur

Re: Spark SQL 1.0.1 error on reading fixed length byte array

2014-08-03 Thread Pei-Lun Lee
Hi, We have a PR to support fixed length byte array in parquet file. https://github.com/apache/spark/pull/1737 Can someone help verifying it? Thanks. 2014-07-15 19:23 GMT+08:00 Pei-Lun Lee : > Sorry, should be SPARK-2489 > > > 2014-07-15 19:22 GMT+08:00 Pei-Lun Lee : > >