Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Jungtaek Lim Wed, 02 Oct 2019 15:32:21 -0700

I'm not 100% sure I understand the question. Assuming you're referring
"both" as SPARK-26283 [1] and SPARK-29322 [2], if you ask about the fix
then yes, only master branch as fix for SPARK-26283 is not ported back to
branch-2.4. If you ask about the issue (problem) then maybe no, according
to the affected version of SPARK-26283 (2.4.0 is also there).


On Wed, Oct 2, 2019 at 11:47 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Thank you for the investigation and making a fix.
>
> So, both issues are on only master (3.0.0) branch?
>
> Bests,
> Dongjoon.
>
>
> On Wed, Oct 2, 2019 at 00:06 Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> FYI: patch submitted - https://github.com/apache/spark/pull/25996
>>
>> On Wed, Oct 2, 2019 at 3:25 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> wrote:
>>
>>> I need to do full manual test to make sure, but according to experiment
>>> (small UT) "closeFrameOnFlush" seems to work.
>>>
>>> There was relevant change on master branch SPARK-26283 [1], and it
>>> changed the way to read the zstd event log file to "continuous", which
>>> seems to read open frame. With "closeFrameOnFlush" being false for
>>> ZstdOutputStream, frame is never closed (even flushing output stream)
>>> unless output stream is closed.
>>>
>>> I'll raise a patch once manual test is passed. Sorry for the false alarm.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> 1. https://issues.apache.org/jira/browse/SPARK-26283
>>>
>>> On Wed, Oct 2, 2019 at 2:33 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> The change log for zstd v1.4.3 feels me that the changes don't seem to
>>>> be related.
>>>>
>>>> https://github.com/facebook/zstd/blob/dev/CHANGELOG#L1-L5
>>>>
>>>> v1.4.3
>>>> bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
>>>> bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
>>>> build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
>>>> misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)
>>>>
>>>> But it's only the matter of dependency update and rebuild, so I'll try
>>>> it out.
>>>>
>>>> Before that, I just indicated ZstdOutputStream has a parameter
>>>> "closeFrameOnFlush" which seems to deal with flush. We let the value as the
>>>> default value which is "false". Let me pass the value to "true" and see it
>>>> helps. Please let me know if someone knows why we pick the value as false
>>>> (or let it by default).
>>>>
>>>>
>>>> On Wed, Oct 2, 2019 at 1:48 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you for reporting, Jungtaek.
>>>>>
>>>>> Can we try to upgrade it to the newer version first?
>>>>>
>>>>> Since we are at 1.4.2, the newer version is 1.4.3.
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 1, 2019 at 9:18 PM Mridul Muralidharan <mri...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Makes more sense to drop support for zstd assuming the fix is not
>>>>>> something at spark end (configuration, etc).
>>>>>> Does not make sense to try to detect deadlock in codec.
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim
>>>>>> <kabhwan.opensou...@gmail.com> wrote:
>>>>>> >
>>>>>> > Hi devs,
>>>>>> >
>>>>>> > I've discovered an issue with event logger, specifically reading
>>>>>> incomplete event log file which is compressed with 'zstd' - the reader
>>>>>> thread got stuck on reading that file.
>>>>>> >
>>>>>> > This is very easy to reproduce: setting configuration as below
>>>>>> >
>>>>>> > - spark.eventLog.enabled=true
>>>>>> > - spark.eventLog.compress=true
>>>>>> > - spark.eventLog.compression.codec=zstd
>>>>>> >
>>>>>> > and start Spark application. While the application is running, load
>>>>>> the application in SHS webpage. It may succeed to replay the event log, 
>>>>>> but
>>>>>> high likely it will be stuck and loading page will be also stuck.
>>>>>> >
>>>>>> > Please refer SPARK-29322 for more details.
>>>>>> >
>>>>>> > As the issue only occurs with 'zstd', the simplest approach is
>>>>>> dropping support of 'zstd' for event log. More general approach would be
>>>>>> introducing timeout on reading event log file, but it should be able to
>>>>>> differentiate thread being stuck vs thread busy with reading huge event 
>>>>>> log
>>>>>> file.
>>>>>> >
>>>>>> > Which approach would be preferred in Spark community, or would
>>>>>> someone propose better ideas for handling this?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>
>>>>>>

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

Reply via email to