Re: Profiling data quality with Spark

2022-12-28 Thread infa elance
You can also look at informatica data quality that runs on spark. Of course it’s not free but you can sign up for a 30 day free trial. They have both profiling and prebuilt data quality rules and accelerators. Sent from my iPhoneOn Dec 28, 2022, at 10:02 PM, vaquar khan wrote:@ Gourav Sengupta wh

spark 3.2 release date

2021-08-30 Thread infa elance
What is the expected ballpark release date of spark 3.2 ? Thanks and Regards, Ajay.

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-26 Thread infa elance
Great job everyone !! Do we have any tentative GA dates yet? Thanks and Regards, Ajay. On Tue, Dec 24, 2019 at 5:11 PM Star wrote: > Awesome work. Thanks and happy holidays~! > > > On 2019-12-25 04:52, Yuming Wang wrote: > > Hi all, > > > > To enable wide-scale community testing of the upcoming

Re: Spark Newbie question

2019-07-11 Thread infa elance
gt; data store without reading it in first. > > Jerry > > On Thu, Jul 11, 2019 at 1:27 PM infa elance wrote: > >> Sorry, i guess i hit the send button too soon >> >> This question is regarding a spark stand-alone cluster. My understanding >> is spark is a

Re: Spark Newbie question

2019-07-11 Thread infa elance
(df/rdd) as hive or deltalake table? Spark version with hadoop : spark-2.0.2-bin-hadoop2.7 Thanks and appreciate your help!! Ajay. On Thu, Jul 11, 2019 at 12:19 PM infa elance wrote: > This is stand-alone spark cluster. My understanding is spark is an > execution engine and not a storage

Spark Newbie question

2019-07-11 Thread infa elance
This is stand-alone spark cluster. My understanding is spark is an execution engine and not a storage layer. Spark processes data in memory but when someone refers to a spark table created through sparksql(df/rdd) what exactly are they referring to? Could it be a Hive table? If yes, is it the same

PySpark row_number Question

2017-04-14 Thread infa elance
Hi All, I trying to understand how row_number is applied In the below code, does spark store data in a dataframe and then perform row_number function or does it apply while reading from hive ? from pyspark.sql import HiveContext hiveContext = HiveContext(sc) hiveContext.sql(" ( SELECT colunm1 ,c

PySpark row_number Question

2017-04-14 Thread infa elance
Hi All, I trying to understand how row_number is applied In the below code, does spark store data in a dataframe and then perform row_number function or does it apply while reading from hive ? from pyspark.sql import HiveContext hiveContext = HiveContext(sc) hiveContext.sql(" ( SELECT colunm1 ,col

Spark and Hive connection

2017-04-05 Thread infa elance
Hi all, When using spark-shell my understanding is spark connects to hive through metastore. The question i have is does spark connect to metastore , is it JDBC? Thanks and Regards, Ajay.