Boris Shilov created BEAM-10172:
-----------------------------------

             Summary: BigQuerySource external data source support in non-US 
regions
                 Key: BEAM-10172
                 URL: https://issues.apache.org/jira/browse/BEAM-10172
             Project: Beam
          Issue Type: Bug
          Components: io-py-gcp
         Environment: DirectRunner
            Reporter: Boris Shilov


I am attempting to query an [external data 
source|https://cloud.google.com/bigquery/external-data-sources], a MySQL 
database that is exposed via the BigQuery API, located in the EU region. I have 
the following format query string:


{code:python}
query = """
    SELECT * 
    FROM EXTERNAL_QUERY("my-project-one-253518.eu.external-source", 
    "SELECT * FROM my schema.mytable;");
    """
{code}

And the following pipeline instantiation:


{code:python}
    pcoll = p | "Load " + name >> beam.io.Read(
        beam.io.BigQuerySource(query=query, use_standard_sql=True)
    )
{code}

When run this, I see the following output:
{code:python}
WARNING:root:Dataset 
my-project-two:temp_dataset_f07dd1398b0443edaa67c360f5be6958 does not exist so 
we will create it as temporary with location=None
ERROR:root:Exception at bundle 
<apache_beam.runners.direct.bundle_factory._Bundle object at 0x127124640>, due 
to an exception.
 Traceback (most recent call last):
  File 
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", line 
345, in call
    finish_state)
  File 
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", line 
385, in attempt_call
    result = evaluator.finish_bundle()
  File 
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
 line 323, in finish_bundle
    bundles = _read_values_to_bundles(reader)
  File 
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
 line 310, in _read_values_to_bundles
    read_result = [GlobalWindows.windowed_value(e) for e in reader]
  File 
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
 line 310, in <listcomp>
    read_result = [GlobalWindows.windowed_value(e) for e in reader]
  File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", 
line 937, in __iter__
    flatten_results=self.flatten_results):
  File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", 
line 710, in run_query
    page_token, location=location)
  File "venv/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 209, 
in wrapper
    return fun(*args, **kwargs)
  File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", 
line 384, in _get_query_results
    response = self.client.jobs.GetQueryResults(request)
  File 
"venv/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
 line 312, in GetQueryResults
    config, request, global_params=global_params)
  File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 
731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)
  File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 
737, in ProcessHttpResponse
    self.__ProcessHttpResponse(method_config, http_response, request))
  File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 
604, in __ProcessHttpResponse
    http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing 
<https://www.googleapis.com/bigquery/v2/projects/my-project-two/queries/636272a8e026434d85200b3f14f719ed?alt=json&location=US&maxResults=10000>:
 response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 
'application/json; charset=UTF-8', 'date': 'Tue, 02 Jun 2020 11:29:27 GMT', 
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 
'alt-svc': 'h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; 
ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443"; 
ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; 
ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 
'transfer-encoding': 'chunked', 'status': '400', 'content-length': '354', 
'-content-encoding': 'gzip'}>, content <{
  "error": {
    "code": 400,
    "message": "Cannot read and write in different locations: source: EU, 
destination: US",
    "errors": [
      {
        "message": "Cannot read and write in different locations: source: EU, 
destination: US",
        "domain": "global",
        "reason": "invalid"
      }
    ],
    "status": "INVALID_ARGUMENT"
  }
}
{code}

Which likely indicates to me that the logic Beam uses to impute the zone in 
which to create the temporary dataset fails when confronted with the special 
syntax for external queries. Therefore it seems like the zone should be exposed 
as a parameter of BigQuerySource.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to