Error when cache partitioned Parquet table

2015-01-26 Thread ZHENG, Xu-dong
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- 郑旭东 ZHENG, Xu-dong

Re: spark sql - save to Parquet file - Unsupported datatype TimestampType

2014-12-08 Thread ZHENG, Xu-dong
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-save-to-Parquet-file-Unsupported-datatype-TimestampType-tp18691.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- 郑旭东 ZHENG, Xu-dong

Re: Is there any way to control the parallelism in LogisticRegression

2014-08-21 Thread ZHENG, Xu-dong
aper than calling RDD.repartition(). > > For coalesce without shuffle, I don't know how to set the right number > of partitions either ... > > -Xiangrui > > On Tue, Aug 12, 2014 at 6:16 AM, ZHENG, Xu-dong wrote: > > Hi Xiangrui, > > > > Thanks for

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong
ld set it explicitly. On Wed, Aug 13, 2014 at 1:10 PM, ZHENG, Xu-dong wrote: > Hi Cheng, > > I also meet some issues when I try to start ThriftServer based a build > from master branch (I could successfully run it from the branch-1.0-jdbc > branch). Below is my build command: >

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong
java:358) >> >> at java.lang.Class.forName0(Native Method) >> >> at java.lang.Class.forName(Class.java:270) >> >> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311) >> >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73) >> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> > > -- 郑旭东 ZHENG, Xu-dong

Re: Is there any way to control the parallelism in LogisticRegression

2014-08-12 Thread ZHENG, Xu-dong
rtitions. -Xiangrui > > On Mon, Aug 11, 2014 at 10:39 PM, ZHENG, Xu-dong > wrote: > > I think this has the same effect and issue with #1, right? > > > > > > On Tue, Aug 12, 2014 at 1:08 PM, Jiusheng Chen > > wrote: > >> > >> How about inc

Re: Is there any way to control the parallelism in LogisticRegression

2014-08-11 Thread ZHENG, Xu-dong
I think this has the same effect and issue with #1, right? On Tue, Aug 12, 2014 at 1:08 PM, Jiusheng Chen wrote: > How about increase HDFS file extent size? like current value is 128M, we > make it 512M or bigger. > > > On Tue, Aug 12, 2014 at 11:46 AM, ZHENG, Xu-dong >

Is there any way to control the parallelism in LogisticRegression

2014-08-11 Thread ZHENG, Xu-dong
. I find a lot of 'ANY' tasks, that means that tasks read data from other nodes, and become slower than that read data from local memory. I think the best way should like #3, but leverage locality as more as possible. Is there any way to do that? Any suggestions? Thanks! -- ZHENG, Xu-dong