Re: learning Spark

2017-12-04 Thread Elior Malul
Also, our community is responsive on stack overflow - also, I will be happy to help whenever I can. > On Dec 5, 2017, at 9:14 AM, yohann jardin wrote: > > Plenty of documentation is available on Spark website itself: > http://spark.apache.org/docs/latest/#where-to-go-from-here >

Re: spark partitionBy with partitioned column in json output

2018-06-04 Thread Elior Malul
Had the same issue my self. I was surprised at first as well, but I found it useful - the amount of data saved for each partition has decreased. When I load the data from each partition, I add the partitioned columns with lit function before I merge the frames from the different partitions. On Tue

Re: RepartitionByKey Behavior

2018-06-21 Thread Elior Malul
Hi Chawla, There is nothing wrong with your code, nor with Spark. The situation in which two different keys are mapped to the same partition is perfectly valid, since they are mapped to the same 'bucket'. The promise is that all records with the same key 'k' will be mapped to the same partition.

Bug in Window Function

2018-07-25 Thread Elior Malul
Exception in thread "main" org.apache.spark.sql.AnalysisException: collect_set(named_struct(value, country#123 AS value#346, count, (cast(count(country#123) windowspecdefinit ion(campaign_id#104, app_id#93, country#123, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as double) /