Sweet! It's here:
https://issues.apache.org/jira/browse/SPARK-9141?focusedCommentId=14649437&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14649437
On Tue, Jul 28, 2015 at 11:21 PM Michael Armbrust
wrote:
> Can you add your description of the problem as a comment
Can you add your description of the problem as a comment to that ticket and
we'll make sure to test both cases and break it out if the root cause ends
up being different.
On Tue, Jul 28, 2015 at 2:48 PM, Justin Uang wrote:
> Sweet! Does this cover DataFrame#rdd also using the cached query from
>
Sweet! Does this cover DataFrame#rdd also using the cached query from
DataFrame#cache? I think the ticket 9141 is mainly concerned with whether a
derived DataFrame (B) of a cached DataFrame (A) uses the cached query of A,
not whether the rdd from A.rdd or B.rdd uses the cached query of A.
On Tue, J
Thanks for bringing this up! I talked with Michael Armbrust, and it sounds
like this is a from a bug in DataFrame caching:
https://issues.apache.org/jira/browse/SPARK-9141
It's marked as a blocker for 1.5.
Joseph
On Tue, Jul 28, 2015 at 2:36 AM, Justin Uang wrote:
> Hey guys,
>
> I'm running in
Hey guys,
I'm running into some pretty bad performance issues when it comes to using
a CrossValidator, because of caching behavior of DataFrames.
The root of the problem is that while I have cached my DataFrame
representing the features and labels, it is caching at the DataFrame level,
while Cros