Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Jungtaek Lim
I need to do full manual test to make sure, but according to experiment (small UT) "closeFrameOnFlush" seems to work. There was relevant change on master branch SPARK-26283 [1], and it changed the way to read the zstd event log file to "continuous", which seems to read open frame. With "closeFrame

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Jungtaek Lim
The change log for zstd v1.4.3 feels me that the changes don't seem to be related. https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5 v1.4.3 bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709) bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722) build: A

[SS] Possible inconsistent semantics on metric "updated" between stateful operators

2019-10-01 Thread Jungtaek Lim
Hi devs, I've indicated the different semantics on metric "updated" between (Flat)MapGroupsWithState and other stateful operators. * (Flat)MapGroupsWithState: removal is counted as updated * others: removal is not counted as updated Technically, the meanings of "removal" are different: (Flat)Map

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Dongjoon Hyun
Thank you for reporting, Jungtaek. Can we try to upgrade it to the newer version first? Since we are at 1.4.2, the newer version is 1.4.3. Bests, Dongjoon. On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan wrote: > Makes more sense to drop support for zstd assuming the fix is not > somethi

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Mridul Muralidharan
Makes more sense to drop support for zstd assuming the fix is not something at spark end (configuration, etc). Does not make sense to try to detect deadlock in codec. Regards, Mridul On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim wrote: > > Hi devs, > > I've discovered an issue with event logger, s

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-01 Thread Jungtaek Lim
Looks like it's missing, or intended to force custom streaming source implemented as DSv2. I'm not sure Spark community wants to expand DSv1 API: I could propose the change if we get some supports here. To Spark community: given we bring major changes on DSv2, someone would want to rely on DSv1 w

[DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Jungtaek Lim
Hi devs, I've discovered an issue with event logger, specifically reading incomplete event log file which is compressed with 'zstd' - the reader thread got stuck on reading that file. This is very easy to reproduce: setting configuration as below - spark.eventLog.enabled=true - spark.eventLog.co

[SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-01 Thread Jacek Laskowski
Hi, I think I've got stuck and without your help I won't move any further. Please help. I'm with Spark 2.4.4 and am developing a streaming Source (DSv1, MicroBatch) and in getBatch phase when requested for a DataFrame, there is this assert [1] I can't seem to go past with any DataFrame I managed

ApacheCon North America 2020, project participation

2019-10-01 Thread Rich Bowen
Hi, folks, (Note: You're receiving this email because you're on the dev@ list for one or more Apache Software Foundation projects.) For ApacheCon North America 2019, we asked projects to participate in the creation of project/topic specific tracks. This was very successful, with about 15 projects