Hi,
When I cache the dataframe and run the query,
val df = sqlContext.sql("select name,age from TBL_STUDENT where age = 37")
df.cache()
df.show
println(df.queryExecution)
I got the following execution plan,from the optimized logical plan,I can see
the whole analyzed logical plan is totally replaced with the InMemoryRelation
logical plan. But when I look into the Optimizer, I didn't see any optimizer
that relates to the InMemoryRelation.
Could you please explain how the optimization works?
== Parsed Logical Plan ==
'<Project>, argString:< [unresolvedalias(UnresolvedAttribute:
'name),unresolvedalias(UnresolvedAttribute: 'age)]>
'<Filter>, argString:< (UnresolvedAttribute: 'age = 37)>
'<UnresolvedRelation>, argString:< [TBL_STUDENT], None>
== Analyzed Logical Plan ==
name: string, age: int
<Project>, argString:< [AttributeReference:name#1,AttributeReference:age#3]>
<Filter>, argString:< (AttributeReference:age#3 = 37)>
<Subquery>, argString:< TBL_STUDENT>
<LogicalRDD>, argString:<
[AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3],
MapPartitionsRDD[4] at main at NativeMethodAccessorImpl.java:-2>
== Optimized Logical Plan ==
<InMemoryRelation>, argString:<
[AttributeReference:name#1,AttributeReference:age#3], true, 10000,
StorageLevel(true, true, false, true, 1), (<TungstenProject>, argString:<
[AttributeReference:name#1,AttributeReference:age#3]>), None>
== Physical Plan ==
<InMemoryColumnarTableScan>, argString:<
[AttributeReference:name#1,AttributeReference:age#3], (<InMemoryRelation>,
argString:< [AttributeReference:name#1,AttributeReference:age#3], true, 10000,
StorageLevel(true, true, false, true, 1), (<TungstenProject>, argString:<
[AttributeReference:name#1,AttributeReference:age#3]>), None>)>