from:"Vipul Rajan"

Re: How to split a dataframe into two dataframes based on count

2020-05-18 Thread Vipul Rajan

Hi Mohit, "Seems like the limit on parent is executed twice and return different records each time. Not sure why it is executed twice when I mentioned only once" That is to be expected. Since spark follows lazy evaluation, which means that execution only happens when you call an action, every act

Re: Issue with UDF Int Conversion - Str to Int

2020-03-23 Thread Vipul Rajan

Hi Ayan, You don't have to bother with conversion at all. All functions that should work on number columns would still work as long as all values in the column are numbers: scala> df2.printSchema root |-- id: string (nullable = false) |-- id2: string (nullable = false) scala> df2.show +---+---

Re: [External]Re: spark 2.x design docs

2019-09-19 Thread Vipul Rajan

I was looking specifically for documents spark committer use for reference. > > > > Currently I’ve put custom logs in spark-core sources then building and > running jobs on it. > > Form printed logs I try to understand execution flows. > > > > *From:* Vipul Rajan

Re: spark 2.x design docs

2019-09-18 Thread Vipul Rajan

https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/2-JobLogicalPlan.md This is pretty old. but it might help a little bit. I myself am going through the source code and trying to reverse engineer stuff. Let me know if you'd like to pool resources sometime. Regards On Thu, Sep

Re: Use derived column for other derived column in the same statement

2019-04-22 Thread Vipul Rajan

Hi Rishi, TL;DR Using Scala, this would work df.withColumn("derived1", lit("something")).withColumn("derived2", col("derived1") === "something") just note that I used 3 equal to signs instead of just two. That should be enough, if you want to understand why read further. so "==" gives boolean as

Re: Structured Streaming initialized with cached data or others

2019-04-22 Thread Vipul Rajan

Please look into arbitrary stateful aggregation. I do not completely understand your problem though. If you could give me an example. I'd be happy to help On Mon, 22 Apr 2019, 15:31 shicheng31...@gmail.com, wrote: > Hi ,all: > As we all known, structured streaming is used to handle incremen

Re: How to split a dataframe into two dataframes based on count

Re: Issue with UDF Int Conversion - Str to Int

Re: [External]Re: spark 2.x design docs

Re: spark 2.x design docs

Re: Use derived column for other derived column in the same statement

Re: Structured Streaming initialized with cached data or others

6 matches

Site Navigation

Mail list logo

Footer information