Depends on the use case, if you have to join, you're saving a join and a
shuffle from having it already in an array.
If you explode, at least sort within partitions to get you predicate
pushdown when you read the data next time.
On Sun, 19 Jan 2020, 1:19 pm Jörn Franke, wrote:
> Why not two tab
Why not two tables and then you can join them? This would be the standard way.
it depends what your full use case is, what volumes / orders you expect on
average, how aggregations and filters look like. The example below states that
you do a Select all on the table.
> Am 19.01.2020 um 01:50 sch
I think it does mean more memory usage but consider how big your arrays
are. Think about your use case requirements and whether it makes sense to
use arrays. Also it may be preferable to explode if the arrays are very
large. I'd say exploding arrays will make the data more splittable, having
the ar
I am using a dataframe and has structure like this :
root
|-- orders: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- amount: double (nullable = true)
|||-- id: string (nullable = true)
|-- user: string (nullable = true)
|-- language: string (nul