Hi Alex,
I'm certainly interested in helping more people use beam (and beyond
beginner level). I believe there are people that can help as have already
been mentioned in this thread, I am also happy to help create training
materials for people as we identify areas that are in need. Have discusse
Hi Sri,
Afaik, you have to create “PCollection" of "GenericRecord”s and define your
Avro schema manually to write your data into Parquet files.
In this case, you will need to create a ParDo for this translation. Also, I
expect that your schema is the same for all CSV files.
Basic example of us
Why the older version does not hit the limit and the newer version does is not
quite clear, but it could just be expected resource usage differences between
versions.
There were some changes how the network buffers are assigned. But my best guess
is that it's because we changed the default pa
Hi Alex,
I know of
http://www.bigdatainstitute.io/courses/data-engineering-with-apache-beam/
There is also some public materials by Jesse (in CC):
https://github.com/eljefe6a/beamexample
This training uses the above exercises:
https://docs.google.com/presentation/d/1ln5KndBTiskEOGa1QmYSCq16YW
I am not aware of any built-in transform that can do this, however it
should not be that difficult to do with a group-by-key.
Suppose one reads in the CSV file to a PCollection of dictionaries of the
format {'original_column_1': value1, 'original_column_2', value2, ...}.
Suppose further that origi