Hi,
I'm trying to convert a Pandas -> Spark dataframe. One of the columns I have
is of the Category type in Pandas. But there does not seem to be support for
this same type in Spark. What is the best alternative?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.
Hi,
I am iteratively receiving a file which can only be opened as a Pandas
dataframe. For the first such file I receive, I am converting this to a
Spark dataframe using the 'createDataframe' utility function. The next file
onward, I am converting it and union'ing it into the first Spark
dataframe
Hi,
I've downloaded and kept the same set of data files on all my cluster nodes,
in the same absolute path - say /home/xyzuser/data/*. I am now trying to
perform an operation(say open(filename).read()) on all these files in spark,
but by passing local file paths. I was under the assumption that as
Hi,
I have this file reading function is called /foo/ which reads contents into
a list of lists or into a generator of list of lists representing the same
file.
When reading as a complete chunk(1 record array) I do something like:
rdd = file_paths_rdd.map(lambda x: foo(x,"wholeFile")).flatMap(lam
Hi,
I have a pyspark App which when provided a huge amount of data as input
throws the error explained here sometimes:
https://stackoverflow.com/questions/32340639/unable-to-understand-error-sparklistenerbus-has-already-stopped-dropping-event.
All my code is running inside the main function, and t
Hi everyone,
My environment is PySpark with Spark 2.0.0.
I'm using spark to load data from a large number of files into a Spark
dataframe with fields say field1 to field10. While loading my data I have
ensured that records are partitioned by field1 and field2(without using
partitionBy). This was
I've summarized this question in detail in this StackOverflow question with
code snippets and logs:
https://stackoverflow.com/questions/45308406/how-does-spark-handle-timestamp-types-during-pandas-dataframe-conversion/.
Looking for efficient solutions to this?
--
View this message in context:
h