I think the __getattr__ method should be removed from the DataFrame API
in pyspark.
May I draw the Python folk's attention to the issue
https://issues.apache.org/jira/browse/SPARK-7035 and invite comments?
Thank you!
-
To un
42 AM, Karlson wrote:
Hi all,
passing a functools.partial-function as a UserDefinedFunction to
DataFrame.select raises an AttributeException, because
functools.partial
does not have the attribute __name__. Is there any alternative to
relying on __name__ in pyspark/sql/functions.p
Hi all,
passing a functools.partial-function as a UserDefinedFunction to
DataFrame.select raises an AttributeException, because functools.partial
does not have the attribute __name__. Is there any alternative to
relying on __name__ in pyspark/sql/functions.py:126 ?
Hi all,
where is the data stored that is passed to sc.parallelize? Or put
differently, where is the data for the base RDD fetched from when the
DAG is executed, if the base RDD is constructed via sc.parallelize?
I am reading a csv file via the Python csv module and am feeding the
parsed dat