On Wed, Jan 24, 2024 at 10:48 AM Mark Striebeck <mark.strieb...@gmail.com> wrote: > > If point beam to the local jar, will beam start and also stop the expansion > service?
Yes it will. > Thanks > Mark > > On Wed, 24 Jan 2024 at 08:30, Robert Bradshaw via user <user@beam.apache.org> > wrote: >> >> You can also manually designate a replacement jar to be used rather >> than fetching the jar from maven, either as a pipeline option or (as >> of the next release) as an environment variable. The format is a json >> mapping from gradle targets (which is how we identify these jars) to >> local files (or urls). For example, pass >> >> --beam_services='{":sdks:java:extensions:sql:expansion-service:shadowJar": >> "/path/to/your/copy.jar"}' >> >> to use the local jar to automatically expand your SQL transforms. >> >> See the docs at >> https://github.com/apache/beam/blob/7e95776a8d08ef738be49ef47842029c306f2bf5/sdks/python/apache_beam/options/pipeline_options.py#L587 >> >> On Tue, Jan 23, 2024 at 5:59 PM Chamikara Jayalath via user >> <user@beam.apache.org> wrote: >> > >> > The expansion service jar is needed since sql.py includes cross-language >> > transforms that use the Java implementation behind the hood. >> > >> > Once downloaded, the jar is cached, and subsequent jobs should use the jar >> > from that location. >> > >> > If you want to use a locally available jar, you can manually startup an >> > expansion service [1] and point the Python SQL transform to that [2]. >> > >> > Thanks, >> > Cham >> > >> > [1] >> > https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/#choose-an-expansion-service >> > [2] >> > https://github.com/apache/beam/blob/7ff25d896250508570b27683bc76523ac2fe3210/sdks/python/apache_beam/transforms/sql.py#L84 >> > >> > On Tue, Jan 23, 2024 at 3:57 PM Mark Striebeck <mark.strieb...@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> Sorry, this question seems so obvious that I'm sure it came up before. >> >> But I couldn't find anything in the docs or the mail archives. Feel free >> >> to point me in the right direction... >> >> >> >> We are using the Python API for Beam. Recently we started using Beam SQL >> >> - which apparently needs a jar file that is not provided with the Python >> >> Pip package. When I run tests,I can see that Beam downloads >> >> beam-sdks-java-extensions-sql-expansion-service-2.52.0.jar and unpacks it >> >> into ~/.apache_beam and uses it to start an RPC server. >> >> >> >> While this works for local testing, I am trying to figure out how to work >> >> this into our CI and deployment process. >> >> >> >> Preferably would be to download a pip package that has this jar (and >> >> others) in it and just uses it. >> >> >> >> If that doesn't exist (I couldn't find it), then we'd need to check this >> >> jar file into our source tree, so that we can use it for CI but then also >> >> make it part of the docker image that we use to run our Beam pipelines on >> >> GCP Dataflow. How could I tell Beam to use that file instead of >> >> downloading it? I tried obvious settings like CLASSPATH environment >> >> variable - but nothing works. Beam always tries to fetch the file from >> >> maven. >> >> >> >> Again, feel free to point me to any relevant mail discussion or web page. >> >> >> >> Thanks >> >> Mark