Thank you for reporting William. Adding @Valentyn Tymofieiev <valen...@google.com>. Perhaps we could start with creating an issue?
On Mon, Oct 24, 2022 at 9:25 AM William Pietri <will...@williampietri.com> wrote: > Hi! I have inherited code using Apache Beam and I'm trying to figure out > the right way to use it. > > This seemed like it would be a good page for me: > https://beam.apache.org/get-started/wordcount-example/ > > I jumped in with the MinimalWordCount example > <https://beam.apache.org/get-started/wordcount-example/#minimalwordcount-example>, > which says it's explaining this python code > <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_minimal.py>. > However, the explanation doesn't match the code. > > The code from the text goes like this: > > > pipeline = beam.Pipeline(options=beam_options) > > pipeline| beam.io.ReadFromText(input_file) > > | 'ExtractWords' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) > > | beam.combiners.Count.PerElement() > > | beam.MapTuple(lambda word, count: '%s: %s' % (word, count)) > > | beam.io.WriteToText(output_path) > > Fair enough, I'd say. However, the code in the example file is quite > different: > > with beam.Pipeline(options=pipeline_options) as p: > # Read the text file[pattern] into a PCollection. > lines = p | ReadFromText(known_args.input) > > # Count the occurrences of each word. > counts = ( > lines > | 'Split' >> ( > beam.FlatMap( > lambda x: re.findall(r'[A-Za-z\']+', > x)).with_output_types(str)) > | 'PairWithOne' >> beam.Map(lambda x: (x, 1)) > | 'GroupAndSum' >> beam.CombinePerKey(sum)) > > # Format the counts into a PCollection of strings. > def format_result(word_count): > (word, count) = word_count > return '%s: %s' % (word, count) > > output = counts | 'Format' >> beam.Map(format_result) > > # Write the output using a "Write" transform that has side effects. > # pylint: disable=expression-not-assigned > output | WriteToText(known_args.output) > > > I assume this is also probably good? But there are a number of differences > here in both structure and content. This confusion is exactly what I don't > need in intro documentation, so I'd love it if somebody made this > consistent. > > Thanks, > > William >