[
https://issues.apache.org/jira/browse/SPARK-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Herman van Hovell closed SPARK-12352.
-------------------------------------
Resolution: Duplicate
> Reuse the result of split in SQL
> --------------------------------
>
> Key: SPARK-12352
> URL: https://issues.apache.org/jira/browse/SPARK-12352
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.5.2
> Reporter: Yadong Qi
> Priority: Critical
>
> When use split in sql, if we want to get the different value through index
> from same array, it will split the same row every time. And the split in Java
> is poor performance.
> {code}
> spark-sql> explain extended select array[0] as a, array[1] as b, array[2] as
> c from (select split(value, ',') as array from src_split) t;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('array[0] AS a#16),unresolvedalias('array[1] AS
> b#17),unresolvedalias('array[2] AS c#18)]
> 'Subquery t
> 'Project [unresolvedalias('split('value,,) AS array#15)]
> 'UnresolvedRelation [src_split], None
> == Analyzed Logical Plan ==
> a: string, b: string, c: string
> Project [array#15[0] AS a#16,array#15[1] AS b#17,array#15[2] AS c#18]
> Subquery t
> Project [split(value#20,,) AS array#15]
> MetastoreRelation default, src_split, None
> == Optimized Logical Plan ==
> Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS
> b#17,split(value#20,,)[2] AS c#18]
> MetastoreRelation default, src_split, None
> == Physical Plan ==
> Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS
> b#17,split(value#20,,)[2] AS c#18]
> HiveTableScan [value#20], (MetastoreRelation default, src_split, None)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]