Re: Downloading and executing addition jar file when using Python API

Robert Bradshaw via user Wed, 24 Jan 2024 10:52:19 -0800

On Wed, Jan 24, 2024 at 10:48 AM Mark Striebeck
<mark.strieb...@gmail.com> wrote:
>
> If point beam to the local jar, will beam start and also stop the expansion 
> service?


Yes it will.

> Thanks
>      Mark
>
> On Wed, 24 Jan 2024 at 08:30, Robert Bradshaw via user <user@beam.apache.org> 
> wrote:
>>
>> You can also manually designate a replacement jar to be used rather
>> than fetching the jar from maven, either as a pipeline option or (as
>> of the next release) as an environment variable. The format is a json
>> mapping from gradle targets (which is how we identify these jars) to
>> local files (or urls). For example, pass
>>
>>   --beam_services='{":sdks:java:extensions:sql:expansion-service:shadowJar":
>> "/path/to/your/copy.jar"}'
>>
>> to use the local jar to automatically expand your SQL transforms.
>>
>> See the docs at
>> https://github.com/apache/beam/blob/7e95776a8d08ef738be49ef47842029c306f2bf5/sdks/python/apache_beam/options/pipeline_options.py#L587
>>
>> On Tue, Jan 23, 2024 at 5:59 PM Chamikara Jayalath via user
>> <user@beam.apache.org> wrote:
>> >
>> > The expansion service jar is needed since sql.py includes cross-language 
>> > transforms that use the Java implementation behind the hood.
>> >
>> > Once downloaded, the jar is cached, and subsequent jobs should use the jar 
>> > from that location.
>> >
>> > If you want to use a locally available jar, you can manually startup an 
>> > expansion service [1] and point the Python SQL transform to that [2].
>> >
>> > Thanks,
>> > Cham
>> >
>> > [1] 
>> > https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/#choose-an-expansion-service
>> > [2] 
>> > https://github.com/apache/beam/blob/7ff25d896250508570b27683bc76523ac2fe3210/sdks/python/apache_beam/transforms/sql.py#L84
>> >
>> > On Tue, Jan 23, 2024 at 3:57 PM Mark Striebeck <mark.strieb...@gmail.com> 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> Sorry, this question seems so obvious that I'm sure it came up before. 
>> >> But I couldn't find anything in the docs or the mail archives. Feel free 
>> >> to point me in the right direction...
>> >>
>> >> We are using the Python API for Beam. Recently we started using Beam SQL 
>> >> - which apparently needs a jar file that is not provided with the Python 
>> >> Pip package. When I run tests,I can see that Beam downloads 
>> >> beam-sdks-java-extensions-sql-expansion-service-2.52.0.jar and unpacks it 
>> >> into ~/.apache_beam and uses it to start an RPC server.
>> >>
>> >> While this works for local testing, I am trying to figure out how to work 
>> >> this into our CI and deployment process.
>> >>
>> >> Preferably would be to download a pip package that has this jar (and 
>> >> others) in it and just uses it.
>> >>
>> >> If that doesn't exist (I couldn't find it), then we'd need to check this 
>> >> jar file into our source tree, so that we can use it for CI but then also 
>> >> make it part of the docker image that we use to run our Beam pipelines on 
>> >> GCP Dataflow. How could I tell Beam to use that file instead of 
>> >> downloading it? I tried obvious settings like CLASSPATH environment 
>> >> variable - but nothing works. Beam always tries to fetch the file from 
>> >> maven.
>> >>
>> >> Again, feel free to point me to any relevant mail discussion or web page.
>> >>
>> >> Thanks
>> >>      Mark

Re: Downloading and executing addition jar file when using Python API

Reply via email to