Re: Python IO Connector

Eugene Kirpichov Mon, 06 Jan 2020 15:13:43 -0800

Agreed with above, it seems prudent to develop a pure-Python connector for
something as common as interacting with a database. It's likely easier to
achieve an idiomatic API, familiar to non-Beam Python SQL users, within
pure Python.


Developing a cross-language connector here might be plain impossible,
because rows read from a database are (at least in JDBC) not encodable -
they require a user's callback to translate to an encodable user type, and
the callback can't be in Python because then you have to encode its input
before giving it to Python. Same holds for the write transform.

Not sure about sqlalchemy though, maybe use plain DB-API
https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python
one is more friendly than JDBC in the sense that it actually returns rows
as tuples of simple data types.

On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <[email protected]> wrote:

> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <[email protected]>
> wrote:
>
>> Regarding cross-language transforms, we need to add better documentation,
>> but for now you'll have to go with existing examples and tests. For example,
>>
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>
>> Note that cross-language transforms feature is currently only available
>> for Flink Runner. Dataflow support is in development.
>>
>
> I think it works with all non-Dataflow runners, with the exception of the
> Java and Go Direct runners. (It does work with the Python direct runner.)
>
>
>> I'm fine with developing this natively for Python as well. AFAIK Java
>> JDBC IO connector is not a super-complicated connector and it should be
>> fine to make relatively easy to maintain and widely usable connectors
>> available in multiple SDKs.
>>
>
> Yes, a case can certainly be made for having native connectors for
> particular common/simple sources. (We certainly don't call cross-language
> to read text files for example.)
>
>
>>
>> Thanks,
>> Cham
>>
>>
>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <[email protected]> wrote:
>>
>>> +Chamikara Jayalath <[email protected]> +Heejong Lee
>>> <[email protected]>
>>>
>>> On Mon, Jan 6, 2020 at 10:20 AM <[email protected]> wrote:
>>>
>>>> How do I go about doing that? From the docs, it appears cross language
>>>> transforms are
>>>> currently undocumented.
>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <[email protected]> wrote:
>>>>
>>>> What about using a cross language transform between Python and the
>>>> already existing Java JdbcIO transform?
>>>>
>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <[email protected]>
>>>> wrote:
>>>>
>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking
>>>>> it would be easiest to use sqlalchemy to achieve maximum database engine
>>>>> support, but I suppose I could also create an ABC for databases that 
>>>>> follow
>>>>> the DB API and create subclasses for each database engine that override a
>>>>> connect method. What are your thoughts on the best way to do this?
>>>>>
>>>>

Re: Python IO Connector

Reply via email to