Re: [Spark SQL] does pyspark udf support spark.sql inside def

2020-09-30 Thread Amit Joshi
Can you pls post the schema of both the tables. On Wednesday, September 30, 2020, Lakshmi Nivedita wrote: > Thank you for the clarification.I would like to how can I proceed for > this kind of scenario in pyspark > > I have a scenario subtracting the total number of days with the number of > ho

[Spark SQL]pyspark to count total number of days-no of holidays by using sql

2020-09-30 Thread Lakshmi Nivedita
I have a table with dates date1 date2 in one table and number of holidays in another table df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1) totalnumberofdays - df2.holidays from table A; df2 = select count(holiays) from table B where holidate >= 'date1'(table A) and holidate < = date

Re: Spark JDBC- OAUTH example

2020-09-30 Thread Artemis User
I'm just curious in regard to what this JDBC connection provider does.  If just read data from a database to Spark, wouldn't it be just using the existing JDBC data source? http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html BTW, OAuth is a web authentication protocol, not part of t

Re: Spark JDBC- OAUTH example

2020-09-30 Thread Gabor Somogyi
Not sure there is already a way. I'm just implementing JDBC connection provider which will make it available in 3.1. Just to be clear when the API is available custom connection provider must be implemented. With actual Spark one can try to write a driver wrapper which does the authentication. G

Spark JDBC- OAUTH example

2020-09-30 Thread KhajaAsmath Mohammed
Hi, I am looking for some information on how to read database which has oauth authentication with spark -jdbc. any links that point to this approach would be really helpful Thanks, Asmath

Re: [Spark SQL] does pyspark udf support spark.sql inside def

2020-09-30 Thread Lakshmi Nivedita
Thank you for the clarification.I would like to how can I proceed for this kind of scenario in pyspark I have a scenario subtracting the total number of days with the number of holidays in pyspark by using dataframes I have a table with dates date1 date2 in one table and number of holidays in

Re: Offset Management in Spark

2020-09-30 Thread Gabor Somogyi
Hi, Structured Streaming stores offsets only in HDFS compatible filesystems. Kafka and S3 are not such. Custom offset storage was only an option in DStreams. G On Wed, Sep 30, 2020 at 9:45 AM Siva Samraj wrote: > Hi all, > > I am using Spark Structured Streaming (Version 2.3.2). I need to rea

Custom Metrics Source -> Sink routing

2020-09-30 Thread Dávid Szakállas
Is there a way to customize what metrics sources are routed to what sinks? If I understood the docs correctly, there are some global switches for enabling sources, e.g. spark.metrics.staticSources.enabled, spark.metrics.executorMetricsSour

Re: Apache Spark Bogotá Meetup

2020-09-30 Thread Miguel Angel Díaz Rodríguez
Cool here is my PR 🤞 https://github.com/apache/spark-website/pull/291 On Wed, 30 Sep 2020 at 07:34, Sean Owen wrote: > Sure, we just ask people to open a pull request against > https://github.com/apache/spark-website to update the page and we can > merge it. > > On Wed, Sep 30, 2020 at 7:30 AM

Re: Apache Spark Bogotá Meetup

2020-09-30 Thread Sean Owen
Sure, we just ask people to open a pull request against https://github.com/apache/spark-website to update the page and we can merge it. On Wed, Sep 30, 2020 at 7:30 AM Miguel Angel Díaz Rodríguez < madiaz...@gmail.com> wrote: > Hello > > I am Co-organizer of Apache Spark Bogotá Meetup from Colomb

Re: [Spark SQL] does pyspark udf support spark.sql inside def

2020-09-30 Thread Sean Owen
No, you can't use the SparkSession from within a function executed by Spark tasks. On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita wrote: > Here is a spark udf structure as an example > > Def sampl_fn(x): >Spark.sql(“select count(Id) from sample Where Id = x ”) > > > Spark.udf.regis

Spark and Twistlock

2020-09-30 Thread Khurram Qureshi
Is there a version of Spark available that has passed Twistlock scans?

[Spark SQL] does pyspark udf support spark.sql inside def

2020-09-30 Thread Lakshmi Nivedita
Here is a spark udf structure as an example Def sampl_fn(x): Spark.sql(“select count(Id) from sample Where Id = x ”) Spark.udf.register(“sample_fn”, sample_fn) Spark.sql(“select id, sampl_fn(Id) from example”) Advance Thanks for the help -- k.Lakshmi Nivedita

Apache Spark Bogotá Meetup

2020-09-30 Thread Miguel Angel Díaz Rodríguez
Hello I am Co-organizer of Apache Spark Bogotá Meetup from Colombia https://www.meetup.com/es/Apache-Spark-Bogota/ And would like to include the community on the following web page. https://spark.apache.org/community.html Looking forward to meeting you Miguel.

Offset Management in Spark

2020-09-30 Thread Siva Samraj
Hi all, I am using Spark Structured Streaming (Version 2.3.2). I need to read from Kafka Cluster and write into Kerberized Kafka. Here I want to use Kafka as offset checkpointing after the record is written into Kerberized Kafka. Questions: 1. Can we use Kafka for checkpointing to manage offset