Oh, looks nice. Thanks for the sharing, Dongjoon
Bests,
Takeshi
On Sat, Dec 7, 2019 at 3:35 AM Dongjoon Hyun
wrote:
> Hi, All.
>
> I want to share the following change to the community.
>
> SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
>
> This is merged today and n
lol how did you know I'm going to read this email Sean?
When I manually identified the stale PRs, I used this conditions below:
1. Author's inactivity over a year. If the PRs were simply waiting for a
review, I excluded it from stale PR list.
2. Ping one time and see if there are any updates with
Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
2227-
2228-
2229-org.apache.hadoop
2230:hadoop-common
2231-3.2.1
2232-
2233-
2234-com.google.guava
2235-
Hi, All.
I want to share the following change to the community.
SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
This is merged today and now Spark's `CREATE TABLE` is using Spark's
default data sources instead of `hive` provider. This is a good and big
improvement for
We used to not be able to close PRs directly, but now we can, so I assume
this is as fine a way of doing so, if we want to. I don't think there's a
policy against it or anything.
Hyukjin how have you managed this one in the past?
I don't mind it being automated if the idle time is long and it posts
That's true, we do use Actions today. I wonder if Apache Infra allows
Actions to close PRs vs. just updating commit statuses. I only ask because
I remember permissions were an issue in the past when discussing tooling
like this.
In any case, I'd be happy to submit a PR adding this in if there are
I think we can add Actions, right? they're used for the newer tests in
Github?
I'm OK closing PRs inactive for a 'long time', where that's maybe 6-12
months or something. It's standard practice and doesn't mean it can't be
reopened.
Often the related JIRA should be closed as well but we have done t
Agree with Bo's idea that the MapStatus could be a more generalized
concept, not necessary to be bound with BlockManager/Executor.
As I understand it, the MapStatus are used to track/record the output data
location of a map task , created by shuffle writer, used by shuffle reader
for finding an
Yeah they are very tricky and have to be integrated with Spark's checkpoint
mechanism as well - I guess that's why this mail thread had been quiet
after some time.
Along with these questions, there might be also some edge-cases which we
have to deal with 2PC approach: suppose a batch got into comm