Re:Re: Low throughput and effect of GC in SparkSql GROUP BY

2015-05-21 Thread zhangxiongfei
Hi Pramod Is your data compressed? I encountered similar problem,however, after turned codegen on, the GC time was still very long.The size of input data for my map task is about 100M lzo file. My query is ""select ip, count(*) as c from stage_bitauto_adclick_d group by ip sort by c limit 10

Hive can not get the schema of an external table created by Spark SQL API "createExternalTable"

2015-05-07 Thread zhangxiongfei
Hi I was trying to create an external table named "adclicktable" by API "def createExternalTable(tableName: String, path: String)",then I can get the schema of this table successfully like below and this table can be queried normally.The data files are all Parquet files. sqlContext.sql("describ

Re:Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread zhangxiongfei
HiActually I did not use Tachyon 0.6.3,just compiled it with 0.5.0 by make-distribution.sh. When I pulled the spark code from github,the Tachyon version was still 0.5.0 in pom,xml. Regards Zhang At 2015-04-29 04:19:20, "sara mustafa" wrote: >Hi Zhang, > >How did you compile Spark 1.3.1 with

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread zhangxiongfei
Hi, I did some tests on Parquet Files with Spark SQL DataFrame API. I generated 36 gzip compressed parquet files by Spark SQL and stored them on Tachyon,The size of each file is about 222M.Then read them with below code. val tfs =sqlContext.parquetFile("tachyon://datanode8.bitauto.dmp:19998/apps

Re:Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-14 Thread zhangxiongfei
JIRA opened:https://issues.apache.org/jira/browse/SPARK-6921 At 2015-04-15 00:57:24, "Cheng Lian" wrote: >Would you mind to open a JIRA for this? > >I think your suspicion makes sense. Will have a look at this tomorrow. >Thanks for reporting! > >Cheng > >

Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-13 Thread zhangxiongfei
Hi experts I run below code in Spark Shell to access parquet files in Tachyon. 1.First,created a DataFrame by loading a bunch of Parquet Files in Tachyon val ta3 =sqlContext.parquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m"); 2.Second, set the "fs.local.block