I don't think Spark supports this model, where N inputs depending on parent
are computed once at the same time. You can of course cache the parent and
filter N times and do the same amount of work. One problem is, where would
the N inputs live? they'd have to be stored if not used immediately, and
If the goal is to split the output, then `DataFrameWriter.partitionBy`
should do what you need, and no additional methods are required. If not you
can also check Silex's implementation muxPartitions (see
https://stackoverflow.com/a/37956034), but the applications are rather
limited, due to high res
Hi
I'm a Python Developer (& Data Scientist) and I contributed to Debian[1][2]
last year as a part of Google Summer of Code[3]. Having used Lucene, Kafka
and Spark in the past, I wanted to work on at least one of them this
summer. Since Spark uses Python[4] (API) unlike the others, I felt I could
g
Hi everyone,
This is a follow-up to the "Identifiers with multi-catalog support"
discussion thread. I've taken the proposal I posted to that thread and
written it up as an official SPIP for how to identify tables and other
catalog objects when working with multiple catalogs.
The doc is available