Would "alter table add column" be supported in the future?

2016-11-09 Thread
Hi, I notice that “alter table add column” command is banned in spark 2.0. Any plans on supporting it in the future? (After all it was supported in spark 1.6.x) Thanks. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread
> 在 2016年9月2日,下午5:58,汪洋 写道: > > Yeah, using external shuffle service is a reasonable choice but I think we > will still face the same problems. We use SSDs to store shuffle files for > performance considerations. If the shuffle files are not going to be used > anymore,

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread
; > > It is described here: > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-ExternalShuffleService.html > > <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-ExternalShuffleService.html> > > --- > Artur > > On

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread
Unless they are brutally > killed. > > You can safely delete the directories when you are sure that the spark > applications related to them have finished. A crontab task may be used for > automatic clean up. > >> On Sep 2, 2016, at 12:18, 汪洋 wrote: >> >&g

Re: rdd.distinct with Partitioner

2016-06-08 Thread
; 在 2016年6月9日,下午12:51,Alexander Pivovarov 写道: > > reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result > > Why reduceByKey with Partitioner exists then? > > On Wed, Jun 8, 2016 at 9:22 PM, 汪洋 <mailto:tiandiwo...@icloud.com>> wrote: > Hi

Re: rdd.distinct with Partitioner

2016-06-08 Thread
Hi Alexander, I think it does not guarantee to be right if an arbitrary Partitioner is passed in. I have created a notebook and you can check it out. (https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366

HiveContext.refreshTable() missing in spark 2.0

2016-05-13 Thread
Hi all, I notice that HiveContext used to have a refreshTable() method, but it doesn’t in branch-2.0. Do we drop that intentionally? If yes, how do we achieve similar functionality? Thanks. Yang

TakeOrderedAndProject operator may causes an OOM

2016-02-03 Thread
Hi, Currently the TakeOrderedAndProject operator in spark sql uses RDD’s takeOrdered method. When we pass a large limit to operator, however, it will return partitionNum*limit number of records to the driver which may cause an OOM. Are there any plans to deal with the problem in the community?

Re: Using distinct count in over clause

2016-01-22 Thread
I think it cannot be right. > 在 2016年1月22日,下午4:53,汪洋 写道: > > Hi, > > Do we support distinct count in the over clause in spark sql? > > I ran a sql like this: > > select a, count(distinct b) over ( order by a rows between unbounded > preceding and cu

Using distinct count in over clause

2016-01-22 Thread
Hi, Do we support distinct count in the over clause in spark sql? I ran a sql like this: select a, count(distinct b) over ( order by a rows between unbounded preceding and current row) from table limit 10 Currently, it return an error says: expression ‘a' is neither present in the group by,

Re: problem with reading source code-pull out nondeterministic expresssions

2015-12-31 Thread
I get it, thanks! > 在 2015年12月31日,上午3:00,Michael Armbrust 写道: > > The goal here is to ensure that the non-deterministic value is evaluated only > once, so the result won't change for a given row (i.e. when sorting). > > On Tue, Dec 29, 2015 at 10:57 PM, 汪洋 <mai

problem with reading source code-pull out nondeterministic expresssions

2015-12-29 Thread
Hi fellas, I am new to spark and I have a newbie question. I am currently reading the source code in spark sql catalyst analyzer. I not quite understand the partial function in PullOutNondeterministric. What does it mean by "pull out”? Why do we have to do the "pulling out”? I would really appre