Agreed with above, it seems prudent to develop a pure-Python connector for something as common as interacting with a database. It's likely easier to achieve an idiomatic API, familiar to non-Beam Python SQL users, within pure Python.
Developing a cross-language connector here might be plain impossible, because rows read from a database are (at least in JDBC) not encodable - they require a user's callback to translate to an encodable user type, and the callback can't be in Python because then you have to encode its input before giving it to Python. Same holds for the write transform. Not sure about sqlalchemy though, maybe use plain DB-API https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python one is more friendly than JDBC in the sense that it actually returns rows as tuples of simple data types. On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <rober...@google.com> wrote: > On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <chamik...@google.com> > wrote: > >> Regarding cross-language transforms, we need to add better documentation, >> but for now you'll have to go with existing examples and tests. For example, >> >> >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py >> >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py >> >> Note that cross-language transforms feature is currently only available >> for Flink Runner. Dataflow support is in development. >> > > I think it works with all non-Dataflow runners, with the exception of the > Java and Go Direct runners. (It does work with the Python direct runner.) > > >> I'm fine with developing this natively for Python as well. AFAIK Java >> JDBC IO connector is not a super-complicated connector and it should be >> fine to make relatively easy to maintain and widely usable connectors >> available in multiple SDKs. >> > > Yes, a case can certainly be made for having native connectors for > particular common/simple sources. (We certainly don't call cross-language > to read text files for example.) > > >> >> Thanks, >> Cham >> >> >> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote: >> >>> +Chamikara Jayalath <chamik...@google.com> +Heejong Lee >>> <heej...@google.com> >>> >>> On Mon, Jan 6, 2020 at 10:20 AM <pbd...@gmail.com> wrote: >>> >>>> How do I go about doing that? From the docs, it appears cross language >>>> transforms are >>>> currently undocumented. >>>> https://beam.apache.org/roadmap/connectors-multi-sdk/ >>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote: >>>> >>>> What about using a cross language transform between Python and the >>>> already existing Java JdbcIO transform? >>>> >>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pbd...@gmail.com> >>>> wrote: >>>> >>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking >>>>> it would be easiest to use sqlalchemy to achieve maximum database engine >>>>> support, but I suppose I could also create an ABC for databases that >>>>> follow >>>>> the DB API and create subclasses for each database engine that override a >>>>> connect method. What are your thoughts on the best way to do this? >>>>> >>>>