Definitely not a spark task.
Moving files within the same filesystem is merely a linking exercise, you
don't have to actually move any data. Write a shell script creating hard
links in the new location, once you're satisfied, remove the old links,
profit.
--
Sent from: http://apache-spark-user
You can specify the line separator to make spark split your records into
separate rows.
df = spark.read.option("lineSep","^^^").text("path")
Then you need to df.select(split("value", "***").as("arrayColumn")) the
column into an array and map over it with getItem to create a column for
each proper
I would say the pros and cons of Python vs Scala is both down to Spark, the
languages in themselves and what kind of data engineer you will get when you
try to hire for the different solutions.
With Pyspark you get less functionality and increased complexity with the
py4j java interop compared to
rmance
than anything else. The memory overhead of JVM objects and GC runs might be
brutal on your performance and memory usage depending on your dataset and
use case.
br,
molotch
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---