Hi Burak, Hurray!!!! so you made finally delta open source :) I always thought of asking TD, is there any chance we could get the streaming graphs back in the SPARK UI? It will just be wonderful.
Hi Shubham, there are always easier way and super fancy way to solve problems, filtering data before persisting is a simple way. Similarly handling data skew in a simple way would be by using monotonically increasing id function in spark with modulus operator. For the fancy way I am sure that someone in the world will be working for mere mortals like me :) Regards, Gourav Sengupta On Wed, May 8, 2019 at 1:41 PM Shubham Chaurasia <shubh.chaura...@gmail.com> wrote: > Thanks > > On Wed, May 8, 2019 at 10:36 AM Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> You could >> >> df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save >> >> It could get some data skew problem but might work for you >> >> >> >> ------------------------------ >> *From:* Burak Yavuz <brk...@gmail.com> >> *Sent:* Tuesday, May 7, 2019 9:35:10 AM >> *To:* Shubham Chaurasia >> *Cc:* dev; user@spark.apache.org >> *Subject:* Re: Static partitioning in partitionBy() >> >> It depends on the data source. Delta Lake (https://delta.io) allows you >> to do it with the .option("replaceWhere", "c = c1"). With other file >> formats, you can write directly into the partition directory >> (tablePath/c=c1), but you lose atomicity. >> >> On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia <shubh.chaura...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> Is there a way I can provide static partitions in partitionBy()? >>> >>> Like: >>> >>> df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save >>> >>> Above code gives following error as it tries to find column `c=c1` in df. >>> >>> org.apache.spark.sql.AnalysisException: Partition column `c=c1` not >>> found in schema struct<a:string,b:string,c:string>; >>> >>> Thanks, >>> Shubham >>> >>