Re: Spark Write BinaryType Column as continues file to S3

2022-04-09 Thread Bjørn Jørgensen
Hi Philipp. I found this SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE paper. Geotrellis do use pdal in geotrellis-pointcloud

Fwd: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Philipp Kraus
Hello, > Am 08.04.2022 um 17:34 schrieb Lalwani, Jayesh : > > What format are you writing the file to? Are you planning on your own custom > format, or are you planning to use standard formats like parquet? I’m dealing with geo-spatial data (Apache Sedona), so I have got a data frame with such

Fwd: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Philipp Kraus
Hello, This sound great for the first step. > Am 08.04.2022 um 17:25 schrieb Sean Owen >: > > You can certainly write that UDF. You get a column in a DataFrame of > array type and you can write that to any appropriate format. > What do you mean by continuous byte stre

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Lalwani, Jayesh
What format are you writing the file to? Are you planning on your own custom format, or are you planning to use standard formats like parquet? Note that Spark can write numeric data in most standard formats. If you use custom format instead, whoever consumes the data needs to parse your data. T

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Sean Owen
That's for strings, but still doesn't address what is desired w.r.t. writing a binary column On Fri, Apr 8, 2022 at 10:31 AM Bjørn Jørgensen wrote: > In the New spark 3.3 there Will be an sql function > https://github.com/apache/spark/commit/25dd4254fed71923731fd59838875c0dd1ff665a > hope this c

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Bjørn Jørgensen
In the New spark 3.3 there Will be an sql function https://github.com/apache/spark/commit/25dd4254fed71923731fd59838875c0dd1ff665a hope this can help you. fre. 8. apr. 2022, 17:14 skrev Philipp Kraus < philipp.kraus.flashp...@gmail.com>: > Hello, > > I have got a data frame with numerical data in

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Sean Owen
You can certainly write that UDF. You get a column in a DataFrame of array type and you can write that to any appropriate format. What do you mean by continuous byte stream? something besides, say, parquet files holding the byte arrays? On Fri, Apr 8, 2022 at 10:14 AM Philipp Kraus < philipp.kraus

Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Philipp Kraus
Hello, I have got a data frame with numerical data in Spark 3.1.1 (Java) which should be converted to a binary file. My idea is that I create a udf function that generates a byte array based on the numerical values, so I can apply this function on each row of the data frame and get than a new