Re: [DISCUSS] "latestFirst" option and metadata growing issue in File stream source

2020-07-29 Thread Jungtaek Lim
bump, is there any interest on this topic? On Mon, Jul 20, 2020 at 6:21 AM Jungtaek Lim wrote: > (Just to add rationalization, you can refer the original mail thread on > dev@ list to see efforts on addressing problems in file stream source / > sink - > https://lists.apache.org/thread.html/r1cd5

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-29 Thread Mridul Muralidharan
I agree, that would be a new feature; and unless compelling reason (like security concerns) would not qualify. Regards, Mridul On Wed, Jul 15, 2020 at 11:46 AM Wenchen Fan wrote: > Supporting Python 3.8.0 sounds like a new feature, and doesn't qualify a > backport. But I'm open to other opinion

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-29 Thread Jason Moore
Hi all, Discussion around 3.0.1 seems to have trickled away. What was blocking the release process kicking off? I can see some unresolved bugs raised against 3.0.0, but conversely there were quite a few critical correctness fixes waiting to be released. Cheers, Jason. From: Takeshi Yamamuro

Write to same hdfs dir from multiple spark jobs

2020-07-29 Thread Deepak Sharma
Hi Is there any design pattern around writing to the same hdfs directory from multiple spark jobs? -- Thanks Deepak www.bigdatabig.com