Documentation on org.apache.spark.sql.functions backend.

2019-09-15 Thread Vipul Rajan
I am trying to create a function that reads data from Kafka, communicates with confluent schema registry and decodes avro data with evolving schemas. I am trying to not create hack-ish patches and to write proper code that I could maybe even create pull requests for. looking at the code I have been

Re: Documentation on org.apache.spark.sql.functions backend.

2019-09-16 Thread Vipul Rajan
Il giorno lun 16 set 2019 alle ore 08:27 Vipul Rajan < > vipul.s.p...@gmail.com> ha scritto: > >> I am trying to create a function that reads data from Kafka, communicates >> with confluent schema registry and decodes avro data with evolving schemas. >> I am tryi

Re: \r\n in csv output

2020-03-23 Thread Vipul Rajan
You can use newAPIHadoopFile import org.apache.hadoop.io.LongWritable import org.apache.hadoop.io.Text import org.apache.hadoop.conf.Configuration import org.apache.hadoop.mapreduce.lib.input.TextInputFormat val conf = new Configuration conf.set("textinputformat.record.delimiter", "\r\n") val df

Spark behavior with changing data source

2020-05-22 Thread Vipul Rajan
I have a use case where I am joining a streamingDataFrame with a static DataFrame. The static DataFrame is read from a parquet table (a directory containing parquet files). This parquet data is updated by another process once a day. I am using structured streaming for the streaming DataFrame. My q