Re: The state of external transforms in Beam

2019-11-04 Thread Chamikara Jayalath
Makes sense. I can look into expanding on what we have at following location and adding links to some of the existing work as a first step. https://beam.apache.org/roadmap/connectors-multi-sdk/ Created https://issues.apache.org/jira/browse/BEAM-8553 We also need more detailed documentation for c

Re: Encoding Problem: Kafka - DataFlow

2019-11-04 Thread Luke Cwik
The only combination that I can think of is to use this hack[1] combined with a JvmInitialier[2]. 1: https://stackoverflow.com/a/14987992/4368200 2: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java On Mon, Nov 4, 2019 at 1:40

Re: Encoding Problem: Kafka - DataFlow

2019-11-04 Thread Leonardo Campos | GameDuell
Thanks, Eddie. Just to add to the discussion, I logged the following information: Charset.defaultCharset(): US-ASCII System.getProperty("file.encoding"): ANSI_X3.4-1968 OutputStreamWriter writer = new OutputStreamWriter(new ByteArrayOutputStream()); writer..getEncoding(): ASCII In our case, a

Re: Encoding Problem: Kafka - DataFlow

2019-11-04 Thread Eddy G
Adding to what Jeff just pointed out previously I'm dealing with the same issue writing Parquet files using the ParquetIO module in Dataflow and same stuff happens, even forcing all String objects with UTF-8. Maybe it is related to behind the scenes decoding/encoding within the previously mentio