[ https://issues.apache.org/jira/browse/BEAM-12554?focusedWorklogId=772234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772234 ]
ASF GitHub Bot logged work on BEAM-12554: ----------------------------------------- Author: ASF GitHub Bot Created on: 19/May/22 03:22 Start Date: 19/May/22 03:22 Worklog Time Spent: 10m Work Description: codecov[bot] commented on PR #17708: URL: https://github.com/apache/beam/pull/17708#issuecomment-1131135568 # [Codecov](https://codecov.io/gh/apache/beam/pull/17708?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#17708](https://codecov.io/gh/apache/beam/pull/17708?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d0c0dee) into [master](https://codecov.io/gh/apache/beam/commit/ea1f292e9cf31fc8c4803b10d811f0d3ee184ae7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ea1f292) will **decrease** coverage by `0.00%`. > The diff coverage is `40.00%`. ```diff @@ Coverage Diff @@ ## master #17708 +/- ## ========================================== - Coverage 73.99% 73.99% -0.01% ========================================== Files 695 695 Lines 91798 91801 +3 ========================================== - Hits 67930 67927 -3 - Misses 22620 22626 +6 Partials 1248 1248 ``` | Flag | Coverage Δ | | |---|---|---| | python | `83.73% <40.00%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/beam/pull/17708?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [sdks/python/apache\_beam/io/fileio.py](https://codecov.io/gh/apache/beam/pull/17708/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZmlsZWlvLnB5) | `95.41% <40.00%> (-0.58%)` | :arrow_down: | | [.../python/apache\_beam/transforms/periodicsequence.py](https://codecov.io/gh/apache/beam/pull/17708/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9wZXJpb2RpY3NlcXVlbmNlLnB5) | `96.72% <0.00%> (-1.64%)` | :arrow_down: | | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/17708/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `92.44% <0.00%> (-0.65%)` | :arrow_down: | | [sdks/python/apache\_beam/transforms/combiners.py](https://codecov.io/gh/apache/beam/pull/17708/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb21iaW5lcnMucHk=) | `93.05% <0.00%> (-0.39%)` | :arrow_down: | | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/17708/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `89.09% <0.00%> (+0.15%)` | :arrow_up: | | [...ks/python/apache\_beam/runners/worker/data\_plane.py](https://codecov.io/gh/apache/beam/pull/17708/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvZGF0YV9wbGFuZS5weQ==) | `88.13% <0.00%> (+0.56%)` | :arrow_up: | ------ [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/17708?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/17708?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ea1f292...d0c0dee](https://codecov.io/gh/apache/beam/pull/17708?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Issue Time Tracking ------------------- Worklog Id: (was: 772234) Time Spent: 50m (was: 40m) > WriteToFiles destination no changing condition > ---------------------------------------------- > > Key: BEAM-12554 > URL: https://issues.apache.org/jira/browse/BEAM-12554 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.27.0, 2.28.0, 2.29.0, 2.30.0, 2.35.0 > Reporter: Inigo San Jose Visiers > Priority: P1 > Time Spent: 50m > Remaining Estimate: 0h > > `WriteToFiles` seems to not be working as it should. I have been running some > tests and my conclusion is that once the condition for the destination is met > once, it is not checked again until a new condition (not seen prev) is met. > This results in wrong distribution of elements across files. > Example code 1: > {code:python} > (p | "Create" >> beam.Create(range(100)) > | beam.Map(lambda x: str(x)) > | fileio.WriteToFiles( > path="./dynamic/", > destination=lambda n: "in" if n in ["17"] else "out", > sink=fileio.TextSink(), > file_naming=fileio.destination_prefix_naming("test")) > ) > {code} > Here, the expected result should be a file called "in-xyz" containing only > "17" and another called "out-xyz" containing the rest. What we see is that > "out" contains numbers 0 to 16 and once 17 condition is met, the rest of > numbers would go to "in". So we would have "out" from 0 to 16, "in" from 17 > on, which is wrong. > Changing the number shows it too. > ____________________ > Example code 2: > {code:python} > def odd_even(x): > value = "even" if int(x) % 2 == 0 else "odd" > print(value, x) > return value > (p | "Create" >> beam.Create(range(100)) > | beam.Map(lambda x: str(x)) > | fileio.WriteToFiles( > path="./dynamic/", > destination=odd_even, > sink=fileio.TextSink(), > file_naming=fileio.destination_prefix_naming("test")) > ) > {code} > We can see that the `odd_even` fn is return the right value, but destination > is still wrong. We get "even" only with 0 and "odd" with the rest of numbers, > since the condition changed with element "1" > ____________________ > Example code 3: > Trying more conditionals or different `file_naming` doesn't fix this > {code:python} > def test_15(n): > three = "three" if int(n) % 3 == 0 else "" > five = "five" if int(n) % 5 == 0 else "" > return f"value-{three}{five}" > def time_format(): > def _inner(window, pane, shard_index, total_shards, compression, > destination): > print(window, pane, shard_index, total_shards, compression, > destination) > return f"dest-{destination}-shards-{shard_index}-of-{total_shards}" > return _inner > (p | "Create" >> beam.Create(range(N)) > | beam.Map(lambda x: str(x)) > | fileio.WriteToFiles( > path="./dynamic/", > destination=test_15, > sink=fileio.TextSink(), > file_naming=time_format()) > ) > {code} > adding shards or other variables don't help either. > I have tested this in different SDKs (27, 29, 30) and Dataflow, DirectRunner, > InteractiveRunner -- This message was sent by Atlassian Jira (v8.20.7#820007)