I've summarized this question in detail in this StackOverflow question with
code snippets and logs:
https://stackoverflow.com/questions/45308406/how-does-spark-handle-timestamp-types-during-pandas-dataframe-conversion/.
Looking for efficient solutions to this?
--
View this message in context:
h
Hi everyone,
My environment is PySpark with Spark 2.0.0.
I'm using spark to load data from a large number of files into a Spark
dataframe with fields say field1 to field10. While loading my data I have
ensured that records are partitioned by field1 and field2(without using
partitionBy). This was
Hi,
I have a pyspark App which when provided a huge amount of data as input
throws the error explained here sometimes:
https://stackoverflow.com/questions/32340639/unable-to-understand-error-sparklistenerbus-has-already-stopped-dropping-event.
All my code is running inside the main function, and t
Hi,
I have this file reading function is called /foo/ which reads contents into
a list of lists or into a generator of list of lists representing the same
file.
When reading as a complete chunk(1 record array) I do something like:
rdd = file_paths_rdd.map(lambda x: foo(x,"wholeFile")).flatMap(lam
Hi,
I've downloaded and kept the same set of data files on all my cluster nodes,
in the same absolute path - say /home/xyzuser/data/*. I am now trying to
perform an operation(say open(filename).read()) on all these files in spark,
but by passing local file paths. I was under the assumption that as
Hi,
I am iteratively receiving a file which can only be opened as a Pandas
dataframe. For the first such file I receive, I am converting this to a
Spark dataframe using the 'createDataframe' utility function. The next file
onward, I am converting it and union'ing it into the first Spark
dataframe
Hi,
I'm trying to convert a Pandas -> Spark dataframe. One of the columns I have
is of the Category type in Pandas. But there does not seem to be support for
this same type in Spark. What is the best alternative?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.