Hi Joseph,
Thanks for your explanation. It makes a lot of sense and I found
http://spark.apache.org/docs/latest/sql-programming-
guide.html#jdbc-to-other-databases giving more.
With that and after I reviewed the code, customSchema option is simply to
override the data type of the fields in a rela
Heads up tomorrows Friday review is going to be at 8:30 am instead of 9:30
am because I had to move some flights around.
On Fri, Jul 13, 2018 at 12:03 PM, Holden Karau wrote:
> This afternoon @ 3pm pacific I'll be looking at review tooling for Spark &
> Beam https://www.youtube.com/watch?v=ff8_j
A work-in-progress PR: https://github.com/apache/spark/pull/21822
The PR also adds the infrastructure to throw exceptions in test mode when
the various transform methods are used as part of analysis. Unfortunately
there are couple edge cases that do need that, and as a result there is
this ugly by
Sure, I can wait for this and create another RC then.
Thanks,
Saisai
Xiao Li 于2018年7月20日周五 上午9:11写道:
> Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I
> created. The PR has been created. Since this is not rare, let us merge it
> to 2.3.2?
>
> Reynold' PR is to get rid of Ana
Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I
created. The PR has been created. Since this is not rare, let us merge it
to 2.3.2?
Reynold' PR is to get rid of AnalysisBarrier. That is better than multiple
patches we added for AnalysisBarrier after 2.3.0 release. We can target
I see, thanks Reynold.
Reynold Xin 于2018年7月20日周五 上午8:46写道:
> Looking at the list of pull requests it looks like this is the ticket:
> https://issues.apache.org/jira/browse/SPARK-24867
>
>
>
> On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin wrote:
>
>> I don't think my ticket should block this relea
Looking at the list of pull requests it looks like this is the ticket:
https://issues.apache.org/jira/browse/SPARK-24867
On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin wrote:
> I don't think my ticket should block this release. It's a big general
> refactoring.
>
> Xiao do you have a ticket for t
I don't think my ticket should block this release. It's a big general
refactoring.
Xiao do you have a ticket for the bug you found?
On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao wrote:
> Hi Xiao,
>
> Are you referring to this JIRA (
> https://issues.apache.org/jira/browse/SPARK-24865)?
>
> Xiao
Hi Xiao,
Are you referring to this JIRA (
https://issues.apache.org/jira/browse/SPARK-24865)?
Xiao Li 于2018年7月20日周五 上午2:41写道:
> dfWithUDF.cache()
> dfWithUDF.write.saveAsTable("t")
> dfWithUDF.write.saveAsTable("t1")
>
>
> Cached data is not being used. It causes a big performance regression.
>
We have had multiple bugs introduced by AnalysisBarrier. In hindsight I
think the original design before analysis barrier was much simpler and
requires less developer knowledge of the infrastructure.
As long as analysis barrier is there, developers writing various code in
analyzer will have to be
Yeah, I was mostly thinking that, if the normal Spark PR tests were setup
to check the sigs (every time? some of the time?), then this could serve as
an automatic check that nothing funny has been done to the archives. There
shouldn't be any difference between the cache and the archive; but if ther
Yeah if the test code keeps around the archive and/or digest of what it
unpacked. A release should never be modified though, so highly rare.
If the worry is hacked mirrors then we might have bigger problems, but
there the issue is verifying the download sigs in the first place. Those
would have to
Is there or should there be some checking of digests just to make sure that
we are really testing against the same thing in /tmp/test-spark that we are
distributing from the archive?
On Thu, Jul 19, 2018 at 11:15 AM Sean Owen wrote:
> Ideally, that list is updated with each release, yes. Non-cur
dfWithUDF.cache()
dfWithUDF.write.saveAsTable("t")
dfWithUDF.write.saveAsTable("t1")
Cached data is not being used. It causes a big performance regression.
2018-07-19 11:32 GMT-07:00 Sean Owen :
> What regression are you referring to here? A -1 vote really needs a
> rationale.
>
> On Thu, Ju
What regression are you referring to here? A -1 vote really needs a
rationale.
On Thu, Jul 19, 2018 at 1:27 PM Xiao Li wrote:
> I would first vote -1.
>
> I might find another regression caused by the analysis barrier. Will keep
> you posted.
>
>
I would first vote -1.
I might find another regression caused by the analysis barrier. Will keep
you posted.
Xiao
2018-07-18 18:05 GMT-07:00 Takeshi Yamamuro :
> +1 (non-binding)
>
> I run tests on a EC2 m4.2xlarge instance;
> [ec2-user]$ java -version
> openjdk version "1.8.0_171"
> OpenJDK Ru
Ideally, that list is updated with each release, yes. Non-current releases
will now always download from archive.apache.org though. But we run into
rate-limiting problems if that gets pinged too much. So yes good to keep
the list only to current branches.
It looks like the download is cached in /t
Hi Team - Any good calculator/Excel to estimate compute and storage
requirements for the new spark jobs to be developed.
Capacity planning based on:-
Job, Data type etc
Thanks,
Deepu Raj
+1 this has been problematic.
Also, this list needs to be updated every time we make a new release?
Plus can we cache them on Jenkins, maybe we can avoid downloading the same
thing from Apache archive every test run.
From: Marco Gaido
Sent: Monday, July 16, 20
I use the state function flatmapgroupswithstate to track state of a kafka
stream. To further customize the state function I like to use a static
datasource (JDBC) in the state function. This datasource contains data I like
to join with the stream (as Iterator) within flatmapgroupswithstate.
Whe
20 matches
Mail list logo