Re: Encoding Problem: Kafka - DataFlow

2019-11-04 Thread Luke Cwik
The only combination that I can think of is to use this hack[1] combined with a JvmInitialier[2]. 1: https://stackoverflow.com/a/14987992/4368200 2: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java On Mon, Nov 4, 2019 at 1:40

Re: Encoding Problem: Kafka - DataFlow

2019-11-04 Thread Leonardo Campos | GameDuell
Thanks, Eddie. Just to add to the discussion, I logged the following information: Charset.defaultCharset(): US-ASCII System.getProperty("file.encoding"): ANSI_X3.4-1968 OutputStreamWriter writer = new OutputStreamWriter(new ByteArrayOutputStream()); writer..getEncoding(): ASCII In our case, a

Re: Encoding Problem: Kafka - DataFlow

2019-11-04 Thread Eddy G
Adding to what Jeff just pointed out previously I'm dealing with the same issue writing Parquet files using the ParquetIO module in Dataflow and same stuff happens, even forcing all String objects with UTF-8. Maybe it is related to behind the scenes decoding/encoding within the previously mentio

Re: Encoding Problem: Kafka - DataFlow

2019-10-31 Thread Jeff Klukas
I ran into exactly this same problem of finding some accented characters getting replaced with "?" in a pipeline only when running on Dataflow and not when using the Direct Runner. KafkaIO was not involved, but I'd bet the root cause is the same. In my case, the input turned out to be properly UTF

Encoding Problem: Kafka - DataFlow

2019-10-31 Thread Leonardo Campos
Hello, Problem: Special characters such as öüä are being save to our sinks are "?". Set up: We read from Kafka using Kafka IO, run the Pipeline with DataFlow Runner and save the results to BigQuery and ElasticSearch. We checked that data is being written to Kafka in UTF-8 (code check). We che