Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

Robert Metzger Wed, 16 Mar 2016 02:27:53 -0700

Sorry for joining this discussion late. Maybe this is also interesting for
you:
http://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/



On Wed, Mar 9, 2016 at 1:47 PM, Prez Cannady <[email protected]>
wrote:

> Thanks.  Need to dive in a bit better, but I did clarify some things in my
> mind which bear mentioning.
>
> 1. Sourcing JDBC data is not a streaming operation, but a batching one.
> Which makes sense, since you generally slurp rather than stream relational
> data, so within the constraints provided you’ll be operating on whole
> result sets.
> 2. Kafka is useful for mating batch processes (like slurping a database)
> with stream ones (reading out the results of a database query then
> distributed to various processing nodes).
>
> Prez Cannady
> p: 617 500 3378
> e: [email protected]
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
> On Mar 9, 2016, at 6:46 AM, Prez Cannady <[email protected]>
> wrote:
>
> I suspected as much (the tuple size limitation).  Creating my own
> InputFormat seems to be the best solution, but before i go down that rabbit
> hole I wanted to see at least a semi-trivial working example of
> JDBCInputFormat with Scala 2.11.
>
> I’d appreciate a look at that prototype if its publicly available (even if
> it is Java). I might glean a hint from it.
>
> Prez Cannady
> p: 617 500 3378
> e: [email protected]
> GH: https://github.com/opencorrelate
> LI: https://www.linkedin.com/in/revprez
>
> On Mar 9, 2016, at 3:25 AM, Chesnay Schepler <[email protected]> wrote:
>
> you can always create your own InputFormat, but there is no
> AbstractJDBCInputFormat if that's what you were looking for.
>
> When you say arbitrary tuple size, do you mean a) a size greater than 25,
> or b) tuples of different sizes?
> If a) unless you are fine with using nested tuples you won't get around
> the tuple size limitation. Since the user has to be aware of the nesting
> (since the fields can be accessed directly via tuple.f0 etc), this can't
> really be done in a general-purpose fashion.
> If b) this will straight-up not work with tuples.
>
> You could use POJO's though. then you could also group by column names.
>
> I'm not sure about Scala, but in the Java Stream API you can pass the
> InputFormat and the TypeInformation into createInput.
>
> I've recently did a prototype where the input type is determined
> automatically by querying the database. If this is a problem for you feel
> free to ping me.
>
> On 09.03.2016 03:17, Prez Cannady wrote:
>
> I’m attempting to create a stream using JDBCInputFormat.  Objective is to
> convert each record into a tuple and then serialize for input into a Kafka
> topic.  Here’s what I have so far.
>
> ```
> val env = StreamExecutionEnvironment.getExecutionEnvironment
>
> val inputFormat = JDBCInputFormat.buildJDBCInputFormat()
>       .setDrivername("org.postgresql.Driver")
>       .setDBUrl("jdbc:postgresql:test")
>       .setQuery("select name from persons")
>       .finish()
>
> val stream : DataStream[Tuple1[String]] = env.createInput(...)
> ```
>
> I think this is essentially what I want to do.  It would be nice if I
> could return tuples of arbitrary length, but reading the code suggests I
> have to commit to a defined arity.  So I have some questions.
>
> 1. Is there a better way to read from a database (i.e., defining my own
> `InputFormat` using Slick)?
> 2. To get the above example working, what should I supply to `createInput`?
>
>
> Prez Cannady
> p: 617 500 3378
> e:  <[email protected]>[email protected]
> GH:  <https://github.com/opencorrelate>https://github.com/opencorrelate
> LI:  <https://www.linkedin.com/in/revprez>
> https://www.linkedin.com/in/revprez
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

Reply via email to