from:"汪洋"

Would "alter table add column" be supported in the future?

2016-11-09 Thread 汪洋

Hi, I notice that “alter table add column” command is banned in spark 2.0. Any plans on supporting it in the future? (After all it was supported in spark 1.6.x) Thanks. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread 汪洋

> 在 2016年9月2日，下午5:58，汪洋写道： > > Yeah, using external shuffle service is a reasonable choice but I think we > will still face the same problems. We use SSDs to store shuffle files for > performance considerations. If the shuffle files are not going to be used > anymore,

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread 汪洋

; > > It is described here: > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-ExternalShuffleService.html > > <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-ExternalShuffleService.html> > > --- > Artur > > On

Re: shuffle files not deleted after executor restarted

2016-09-02 Thread 汪洋

Unless they are brutally > killed. > > You can safely delete the directories when you are sure that the spark > applications related to them have finished. A crontab task may be used for > automatic clean up. > >> On Sep 2, 2016, at 12:18, 汪洋 wrote: >> >&g

Re: rdd.distinct with Partitioner

2016-06-08 Thread 汪洋

; 在 2016年6月9日，下午12:51，Alexander Pivovarov 写道： > > reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result > > Why reduceByKey with Partitioner exists then? > > On Wed, Jun 8, 2016 at 9:22 PM, 汪洋 <mailto:tiandiwo...@icloud.com>> wrote: > Hi

Re: rdd.distinct with Partitioner

2016-06-08 Thread 汪洋

Hi Alexander, I think it does not guarantee to be right if an arbitrary Partitioner is passed in. I have created a notebook and you can check it out. (https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366

HiveContext.refreshTable() missing in spark 2.0

2016-05-13 Thread 汪洋

Hi all, I notice that HiveContext used to have a refreshTable() method, but it doesn’t in branch-2.0. Do we drop that intentionally? If yes, how do we achieve similar functionality? Thanks. Yang

TakeOrderedAndProject operator may causes an OOM

2016-02-03 Thread 汪洋

Hi, Currently the TakeOrderedAndProject operator in spark sql uses RDD’s takeOrdered method. When we pass a large limit to operator, however, it will return partitionNum*limit number of records to the driver which may cause an OOM. Are there any plans to deal with the problem in the community?

Re: Using distinct count in over clause

2016-01-22 Thread 汪洋

I think it cannot be right. > 在 2016年1月22日，下午4:53，汪洋写道： > > Hi, > > Do we support distinct count in the over clause in spark sql? > > I ran a sql like this: > > select a, count(distinct b) over ( order by a rows between unbounded > preceding and cu

Using distinct count in over clause

2016-01-22 Thread 汪洋

Hi, Do we support distinct count in the over clause in spark sql? I ran a sql like this: select a, count(distinct b) over ( order by a rows between unbounded preceding and current row) from table limit 10 Currently, it return an error says: expression ‘a' is neither present in the group by,

Re: problem with reading source code-pull out nondeterministic expresssions

2015-12-31 Thread 汪洋

I get it, thanks! > 在 2015年12月31日，上午3:00，Michael Armbrust 写道： > > The goal here is to ensure that the non-deterministic value is evaluated only > once, so the result won't change for a given row (i.e. when sorting). > > On Tue, Dec 29, 2015 at 10:57 PM, 汪洋 <mai

problem with reading source code-pull out nondeterministic expresssions

2015-12-29 Thread 汪洋

Hi fellas, I am new to spark and I have a newbie question. I am currently reading the source code in spark sql catalyst analyzer. I not quite understand the partial function in PullOutNondeterministric. What does it mean by "pull out”? Why do we have to do the "pulling out”? I would really appre

Would "alter table add column" be supported in the future?

Re: shuffle files not deleted after executor restarted

Re: shuffle files not deleted after executor restarted

Re: shuffle files not deleted after executor restarted

Re: rdd.distinct with Partitioner

Re: rdd.distinct with Partitioner

HiveContext.refreshTable() missing in spark 2.0

TakeOrderedAndProject operator may causes an OOM

Re: Using distinct count in over clause

Using distinct count in over clause

Re: problem with reading source code-pull out nondeterministic expresssions

problem with reading source code-pull out nondeterministic expresssions

12 matches

Site Navigation

Mail list logo

Footer information