Re: Beam courses

2019-01-14 Thread Austin Bennett
Hi Alex, I'm certainly interested in helping more people use beam (and beyond beginner level). I believe there are people that can help as have already been mentioned in this thread, I am also happy to help create training materials for people as we identify areas that are in need. Have discusse

Re: ParquetIO write of CSV document data

2019-01-14 Thread Alexey Romanenko
Hi Sri, Afaik, you have to create “PCollection" of "GenericRecord”s and define your Avro schema manually to write your data into Parquet files. In this case, you will need to create a ParDo for this translation. Also, I expect that your schema is the same for all CSV files. Basic example of us

Re: [flink-runner] taskmanager.network.memory.max ignored for local Flink runner

2019-01-14 Thread Maximilian Michels
Why the older version does not hit the limit and the newer version does is not quite clear, but it could just be expected resource usage differences between versions. There were some changes how the network buffers are assigned. But my best guess is that it's because we changed the default pa

Re: Beam courses

2019-01-14 Thread Maximilian Michels
Hi Alex, I know of http://www.bigdatainstitute.io/courses/data-engineering-with-apache-beam/ There is also some public materials by Jesse (in CC): https://github.com/eljefe6a/beamexample This training uses the above exercises: https://docs.google.com/presentation/d/1ln5KndBTiskEOGa1QmYSCq16YW

Re: transpose CSV transform

2019-01-14 Thread Robert Bradshaw
I am not aware of any built-in transform that can do this, however it should not be that difficult to do with a group-by-key. Suppose one reads in the CSV file to a PCollection of dictionaries of the format {'original_column_1': value1, 'original_column_2', value2, ...}. Suppose further that origi