Hi,
could not agree more with Molotch :)
Regards,
Gourav Sengupta
On Thu, May 27, 2021 at 7:08 PM Molotch wrote:
> You can specify the line separator to make spark split your records into
> separate rows.
>
> df = spark.read.option("lineSep","^^^").text("path")
>
> Then you need to df.select(
You can specify the line separator to make spark split your records into
separate rows.
df = spark.read.option("lineSep","^^^").text("path")
Then you need to df.select(split("value", "***").as("arrayColumn")) the
column into an array and map over it with getItem to create a column for
each proper
Hi,
What is the best way to read a large text file in Pyspark ( > 1TB). The
file is generated by source system on which we can't make any changes and
the file has a custom column separator('***') and record delimiter ('^^^').
Reading this in Pyspark Dataframe directly is not possible(as reading t