Hi Ramana, Interesting -- I see it too when not using Runner v2. Runner v2 shows UTF-8 as expected, but without it, I get ANSI_X3.4-1968 for file.encoding.
I'd say it's probably undesired, but we'd need to look further. Curious why it would cause data corruption. Are you relying on Charset.defaultCharset for text operations? Best, Bruno On Fri, Jun 16, 2023 at 12:11 AM Ramana Venkata <vram...@kisi.io> wrote: > Hi Bruno, > > I have added a log statement in a DoFn. > logger.info(System.getProperty('file.encoding')) > and that showed ANSI as the file encoding. There isn't anything in our code > that sets ANSI file encoding. I will check with Google Support. > > > On Fri, Jun 16, 2023 at 7:27 AM Bruno Volpato via user < > user@beam.apache.org> wrote: > >> Hi Ramana, >> >> Curious where you got ANSI_X3.4-1968 from -- I don't think there's any >> trace of this encoding anywhere in Dataflow Workers (as far as I am aware >> and looked around). >> The default encoding for JVM is UTF-8, and Dataflow doesn't appear to set >> it anywhere. I was able to check using: >> >> $ docker run -it --entrypoint '/bin/bash' >> us-central1-artifactregistry.gcr.io/google.com/dataflow-containers/worker/v1beta3/beam_java11_sdk:2.48.0 >> >> # jshell >> >> > System.getProperty("file.encoding"); >> $1 ==> "UTF-8" >> >> >> If you can't figure out if your job is using ANSI, I'd suggest contacting >> Google support and providing relevant job IDs so this can be looked at >> further. >> Best, >> Bruno >> >> >> >> On Thu, Jun 15, 2023 at 5:03 AM Ramana Venkata <vram...@kisi.io> wrote: >> >>> Hi, >>> >>> I accidentally discovered that the default file encoding in my Dataflow >>> runners is ANSI_X3.4-1968. We expected it to be UTF-8, and as a result, >>> some of our data has been corrupted. >>> >>> I came across this Stack Overflow answer (link: >>> https://stackoverflow.com/a/362006), but to the best of my knowledge, >>> there is no way to pass flags to the Java command in Dataflow runners. >>> >>> I would appreciate your assistance in resolving this issue. >>> >>> Let me know if you have any further questions! >>> >>> -- >>> >>> Venkata Ramana >>> >>> Senior Software Engineer >>> >>> Kisi Inc, 45 Main Street, Suite 608, Brooklyn, NY 11201 >>> <https://maps.google.com/?q=45+Main+Street,+Suite+723,+%C2%A0Brooklyn,+NY+11201&entry=gmail&source=g> >>> >>> www.getkisi.com >>> <http://getkisi.com/?utm_source=email&utm_medium=email&utm_campaign=email> >>> >>> >>> >>> >>> --- >>> This email is confidential/privileged. If you're not the intended >>> recipient, please delete it and notify us immediately; please do not >>> copy/use/disclose it for any purpose, to anyone. Thank you! >>> >> > > -- > > Venkata Ramana > > Senior Software Engineer > > Kisi Inc, 45 Main Street, Suite 608, Brooklyn, NY 11201 > <https://maps.google.com/?q=45+Main+Street,+Suite+723,+%C2%A0Brooklyn,+NY+11201&entry=gmail&source=g> > > www.getkisi.com > <http://getkisi.com/?utm_source=email&utm_medium=email&utm_campaign=email> > > > > > --- > This email is confidential/privileged. If you're not the intended > recipient, please delete it and notify us immediately; please do not > copy/use/disclose it for any purpose, to anyone. Thank you! >