Hi, Spark devs
I am wondering if it is possible to apply AQE on part of the physical plan?
e.g. I only want to apply coalesce partitions on a
particular ShuffleQueryStageExec?
I didn't find a very straightforward way to achieve this, but is there a
way to workaround the current limitation?
Thank
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/135#issuecomment-37704961
@mateiz, done~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/135#issuecomment-37680371
sure, will do that this evening~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/135#issuecomment-37603588
Ah, good, thank you very much for the comments @rxin @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/135#discussion_r10580163
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -310,6 +310,9 @@ abstract class RDD[T: ClassTag](
* Return a sampled subset of
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/135
SPARK-1240: handle the case with empty RDD when take sample
https://spark-project.atlassian.net/browse/SPARK-1240
It seems that the current implementation does not handle the empty RDD
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/101#discussion_r10576332
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -222,4 +232,19 @@ private[spark] object HadoopRDD {
def
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/101#issuecomment-37558032
Hi, @aarondav , thank you very much for the comments, I think it's ready
for the further review
---
If your project is set up for it, you can reply to this emai
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/101#issuecomment-37498220
Hi, @kayousterhout and @aarondav , Thank you for your comments, I addressed
them
One potential issue is that, to call the function in HadoopRDD, I moved
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/133#issuecomment-37494644
Hi, @pwendell and @aarondav, is it good?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/133
hot fix for PR105 - change to Java annotation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/CodingCat/spark SPARK-1160-2
Alternatively you
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/101#discussion_r10548392
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -165,12 +174,29 @@ class HadoopRDD[K, V](
override def compute
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/101#issuecomment-37485195
@aarondav Thank you for your comments, I will address them
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/101#discussion_r10548369
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -165,12 +174,29 @@ class HadoopRDD[K, V](
override def compute
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/101#discussion_r10548362
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -165,12 +174,29 @@ class HadoopRDD[K, V](
override def compute
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/101#discussion_r10548255
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -165,12 +174,29 @@ class HadoopRDD[K, V](
override def compute
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/35#issuecomment-37452003
ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/105#issuecomment-37451918
ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/101#issuecomment-37451940
ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/35#issuecomment-37289205
Hi, @pwendell, do you have time to take a look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-37231038
No problem, thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/101#issuecomment-37150695
anyone would like to review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/35#issuecomment-37150704
ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-37150712
ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-37116374
@mateiz I have rebased the code, any further comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/105#issuecomment-37115130
Hi, @pwendell , thank you for the comments
I just fixed that
---
If your project is set up for it, you can reply to this email and have your
reply appear on
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/105
SPARK-1160: Deprecate toArray in RDD
https://spark-project.atlassian.net/browse/SPARK-1160
reported by @mateiz: "It's redundant with collect() and the name doesn't
mak
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/101
SPARK-1128: set hadoop task properties when constructing HadoopRDD
The task properties are set when constructing HadoopRDD in current
implementation, this may limit the implementation based on
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-37057689
ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/96#discussion_r10388297
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -236,13 +236,23 @@ private class MemoryStore(blockManager: BlockManager
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/98#discussion_r10387879
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -278,6 +278,10 @@ private[spark] object Utils extends Logging {
uc = new
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/96#discussion_r10387705
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -236,13 +236,23 @@ private class MemoryStore(blockManager: BlockManager
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/96#discussion_r10387464
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -236,13 +236,23 @@ private class MemoryStore(blockManager: BlockManager
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10367306
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/85#discussion_r10355799
--- Diff: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala ---
@@ -108,6 +108,6 @@ private[spark] object AkkaUtils {
/** Returns the
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/58#issuecomment-36825482
any further comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-36825497
any further comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/85#issuecomment-36818916
what? how to understand the output? no failure, but there are some
exceptions in the console output
Accumulator cannot be accessed inside task?
---
If your
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/85#issuecomment-36814645
@pwendell , I removed some redundant parameters, but I'm thinking that
which option is more convenient for the user, different pages contain different
set of param
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/85#issuecomment-36813427
@pwendell , yep, I asked in the mail list, but didn't get response, so I
decided to put the things here first and revise it (e.g. remove those
unnecessary) based o
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/85#issuecomment-36812678
I said it is WIP, because of 2 things
1. The document is surely need to be revised, I'm not sure if I understand
all details correctly, though I spent n
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/85
SPARK-1192: The document for most of the parameters used in core component
I grep the code in core component, I found that around 30 parameters in the
implementation is actually used but
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/63#issuecomment-36800228
@kayousterhout Thank you very much!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-36702525
This is ready to merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/63#issuecomment-36702509
How about this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10283513
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/35#issuecomment-36698403
personally, I felt that,
https://spark-project.atlassian.net/browse/SPARK-1175 is also related to this
issue.
---
If your project is set up for it, you can reply
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/58#issuecomment-36698085
fixed that line as well as others with the same issue
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/63#discussion_r10263675
--- Diff: core/src/main/scala/org/apache/spark/scheduler/WorkerOffer.scala
---
@@ -21,4 +21,4 @@ package org.apache.spark.scheduler
* Represents free
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/74
spark-1178: missing document of spark.scheduler.revive.interval
https://spark-project.atlassian.net/browse/SPARK-1178
The configuration on spark.scheduler.revive.interval is undocumented
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/63#issuecomment-36593562
@kayousterhout @markhamstra @andrewor14 Thank you for your comments,
I updated the code, how about this?
---
If your project is set up for it, you can reply to
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10243399
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,7 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10243179
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,7 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10242829
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,7 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10242393
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10242213
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/58#issuecomment-36590068
I updated the code and tested the functionalities, everything goes well
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10241868
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/58#issuecomment-36481890
I think the better way to fix this is, not allow user to start non-slave
cluster, but allow them to login to a all-slaves-lost cluster?
---
If your project is set up
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/63
simplify the implementation of CoarseGrainedSchedulerBackend
There are 5 main data structures in the class, after reading the source
code, I found that some of them are actually not used, some of
Github user CodingCat closed the pull request at:
https://github.com/apache/spark/pull/59
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/58#issuecomment-36478687
with only a master, en...no service is actually working (in distributed
fashion)
but this patch is just to allow user to login to a master-only cluster
---
If
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/59
SPARK-1166: clean vpc_id if the group was just now created
Reported in https://spark-project.atlassian.net/browse/SPARK-1166
In some very weird situation (when new created group
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/58#issuecomment-36477517
oh, fixed,
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/58
SPARK-1156: allow user to login into a cluster without slaves
Reported in https://spark-project.atlassian.net/browse/SPARK-1159
The current spark-ec2 script doesn't allow user to login
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-36471736
exceed with 5 charssorry.fixed
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10194239
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10194233
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-36445655
I rebased the code after https://github.com/apache/spark/pull/11 was
merged, and tested in my local side, I think it is ready for further
ready/testing
---
If your
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/52
[SPARK-1150] fix repo location in create script (re-open)
reopen for https://spark-project.atlassian.net/browse/SPARK-1150
You can merge this pull request into a Git repository by running
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/48#issuecomment-36442789
sure, Just reopened, https://github.com/apache/spark/pull/52
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/11#issuecomment-36440931
@pwendell done
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/11#issuecomment-36437295
@pwendell Thank you again! Just updated the code
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/7#issuecomment-36433922
close it
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user CodingCat closed the pull request at:
https://github.com/apache/spark/pull/7
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/11#issuecomment-36414832
@pwendell , I just updated the code
In the latest update, I make the checking only applicable to
FileOutputFormat, the difference with your suggestion is that I
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/11#discussion_r10188629
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -712,6 +713,10 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/11#issuecomment-36406325
I changed the code and tested in local side, mind reviewing it again?
@pwendell
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/48
fix repo location in create script
fix the repo location in create_release script
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/CodingCat
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10166948
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-36353844
I changed back the parameter type of the new method to Configuration for
keeping consistent with other APIs, and whether Job should be parameter type is
still under
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/44#discussion_r10164732
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -847,6 +847,8 @@ class SparkContext(
partitions: Seq[Int
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/12#issuecomment-36321849
this is a re-opened PR, in the old PR,
https://github.com/apache/incubator-spark/pull/636, all test cases have passed
Can anyone verify that and make further
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/27#issuecomment-36275312
@shivaram I understand your cautiousness and I agree with Kay on that we
would be careful when adding the complexity to the already-complex code base.
So, I don't
GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/35
[SPARK-1104] kill Process in workerThread
As reported in https://spark-project.atlassian.net/browse/SPARK-1104
By @pwendell: "Sometimes due to large shuffles executors will take a
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/27#issuecomment-36238278
en...it's much simpler...but randomization can just mitigate the issue with
some probability?
---
If your project is set up for it, you can reply to this emai
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/11#discussion_r10121533
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
---
@@ -618,10 +619,6 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self
87 matches
Mail list logo