The only combination that I can think of is to use this hack[1] combined
with a JvmInitialier[2].
1: https://stackoverflow.com/a/14987992/4368200
2:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java
On Mon, Nov 4, 2019 at 1:40
Thanks, Eddie.
Just to add to the discussion, I logged the following information:
Charset.defaultCharset(): US-ASCII
System.getProperty("file.encoding"): ANSI_X3.4-1968
OutputStreamWriter writer = new OutputStreamWriter(new
ByteArrayOutputStream()); writer..getEncoding(): ASCII
In our case, a
Adding to what Jeff just pointed out previously I'm dealing with the same issue
writing Parquet files using the ParquetIO module in Dataflow and same stuff
happens, even forcing all String objects with UTF-8. Maybe it is related to
behind the scenes decoding/encoding within the previously mentio
I ran into exactly this same problem of finding some accented characters
getting replaced with "?" in a pipeline only when running on Dataflow and
not when using the Direct Runner. KafkaIO was not involved, but I'd bet the
root cause is the same.
In my case, the input turned out to be properly UTF
Hello,
Problem: Special characters such as öüä are being save to our sinks are "?".
Set up: We read from Kafka using Kafka IO, run the Pipeline with
DataFlow Runner and save the results to BigQuery and ElasticSearch.
We checked that data is being written to Kafka in UTF-8 (code check). We
che