Re: BigTable reader for Python?

Lina Mårtensson via dev Fri, 30 Dec 2022 01:00:41 -0800

And next issue... I'm getting KeyError: 'beam:coders:javasdk:0.1' which I
learned
<https://cwiki.apache.org/confluence/display/BEAM/Multi-language+Pipelines+Tips>
is because the transform is trying to return something that there
isn't a standard
Beam coder for
<https://github.com/apache/beam/blob/05428866cdbf1ea8e4c1789dd40327673fd39451/model/pipeline/src/main/proto/beam_runner_api.proto#L784>
.
Makes sense, but... how do I fix this? The documentation talks about how to
do this for the input, but not for the output.


Comparing to Spanner, it looks like Spanner returns a protobuf, which I'm
guessing somehow gets converted to bytes... But CloudBigtableIO
<https://github.com/googleapis/java-bigtable-hbase/blob/main/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java>
returns org.apache.hadoop.hbase.client.Result.

My buildExternal method looks like follows:

        @Override

        public PTransform<PBegin, PCollection<Result>> buildExternal(

                BigtableReadBuilder.Configuration configuration) {


            return Read.from(CloudBigtableIO.read(

                new CloudBigtableScanConfiguration.Builder()


                    .withProjectId(configuration.projectId)


                    .withInstanceId(configuration.instanceId)


                    .withTableId(configuration.tableId)

                    .build()

            ));


I also got a warning, which I *believe* is unrelated (but also an issue):

INFO:apache_beam.utils.subprocess_server:b"WARNING: Configuration class
'energy.camus.beam.BigtableRegistrar$BigtableReadBuilder$Configuration' has
no schema registered. Attempting to construct with setter approach."

INFO:apache_beam.utils.subprocess_server:b'Dec 30, 2022 7:46:14 AM
org.apache.beam.sdk.expansion.service.ExpansionService$ExternalTransformRegistrarLoader
payloadToConfig'
What is this schema and what should it look like?

Thanks!
-Lina





On Fri, Dec 30, 2022 at 12:28 AM Lina Mårtensson <[email protected]> wrote:

> Thanks! This was really helpful. It took a while to figure out the details
> - a section in the docs on what's required of these jars for non-Java users
> would be a great addition.
>
> But once I did, the Bazel config was actually quite straightforward and
> makes sense.
> I pasted the first section from here
> <https://github.com/bazelbuild/rules_jvm_external/blob/master/README.md#usage>
>  into
> my WORKSPACE file and changed the artifacts to the ones I needed. (How to
> find the right ones remains confusing.)
>
> After that I updated my BUILD rules and Blaze had easy and straightforward
> configs for it, all I needed was this:
>
> # From
> https://github.com/google/bazel-common/blob/master/third_party/java/auto/BUILD
> .
>
> # The auto service is what registers our Registrar class, and it needs to
> be a plugin which
>
> # makes it run at compile-time.
>
> java_plugin(
>
>     name = "auto_service_processor",
>
>     processor_class =
> "com.google.auto.service.processor.AutoServiceProcessor",
>
>     deps = [
>
>         "@maven//:com_google_auto_service_auto_service",
>
>         "@maven//:com_google_auto_service_auto_service_annotations",
>
>         "@maven//:org_apache_beam_beam_vendor_guava_26_0_jre",
>
>     ],
>
> )
>
>
> java_binary(
>
>     name = "java_hbase",
>
>     main_class = "energy.camus.beam.BigtableRegistrar",
>
>     plugins = [":auto_service_processor"],
>
>     srcs = ["src/main/java/energy/camus/beam/BigtableRegistrar.java"],
>
>     deps = [
>
>         "@maven//:com_google_auto_service_auto_service",
>
>         "@maven//:com_google_auto_service_auto_service_annotations",
>
>
>         "@maven//:com_google_cloud_bigtable_bigtable_hbase_beam",
>
>
>         "@maven//:org_apache_beam_beam_sdks_java_core",
>
>         "@maven//:org_apache_beam_beam_vendor_guava_26_0_jre",
>
>         "@maven//:org_apache_hbase_hbase_shaded_client",
>
>     ],
>
> )
>
>
> On Thu, Dec 29, 2022 at 2:43 PM Luke Cwik <[email protected]> wrote:
>
>> AutoService relies on Java's compiler annotation processor.
>> https://github.com/google/auto/tree/main/service#getting-started shows
>> that you need to configure Java's compiler to use the annotation processors
>> within AutoService.
>>
>> I saw this public gist that seemed to enable using the AutoService
>> annotation processor with Bazel
>> https://gist.github.com/jart/5333824b94cd706499a7bfa1e086ee00
>>
>>
>>
>> On Thu, Dec 29, 2022 at 2:27 PM Lina Mårtensson via dev <
>> [email protected]> wrote:
>>
>>> That's good news about the direct runner, thanks!
>>>
>>> On Thu, Dec 29, 2022 at 2:02 PM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> On Thu, Jul 28, 2022 at 5:37 PM Chamikara Jayalath via dev
>>>> <[email protected]> wrote:
>>>> >
>>>> > On Thu, Jul 28, 2022 at 4:51 PM Lina Mårtensson <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Thanks for the detailed answers!
>>>> >>
>>>> >> I totally get the points about development & maintenance cost, and,
>>>> >> from a user perspective, about getting the performance right.
>>>> >>
>>>> >> I decided to try out the Spanner connector to get a sense of how well
>>>> >> the x-language approach works in our world, since that's an existing
>>>> >> x-language connector.
>>>> >> Overall, it works and with minimal intervention as you say - it is
>>>> >> very slow, though.
>>>> >> I'm a little confused about "portable runners" - if I understand this
>>>> >> correctly, this means we couldn't run with the DirectRunner anymore
>>>> if
>>>> >> using an x-language connector? (At least it didn't work when I tried
>>>> >> it.)
>>>> >
>>>> >
>>>> > You'll have to use the portable DirectRunner -
>>>> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/portability
>>>> >
>>>> > Job service for this can be started using following command:
>>>> > python apache_beam/runners/portability/local_job_service_main.py -p
>>>> <port>
>>>>
>>>> Note that the Python direct runner is already a portable runner, so
>>>> you shouldn't have to do anything special (like start up a separate
>>>> job service and pass extra options) to run locally. Just use the
>>>> cross-language transforms as you would any normal Python transform.
>>>>
>>>> The goal is to make this as smooth and transparent as possible; please
>>>> keep coming back to us if you find rough edges.
>>>>
>>>

Re: BigTable reader for Python?

Reply via email to