Hey Eugene, Thanks for this, didn’t realize this was a parameter I could tune. Fixed my problems straight away.
Chet > On Nov 29, 2017, at 2:14 PM, Eugene Kirpichov <[email protected]> wrote: > > Hi, > I think you're hitting something that can be fixed by configuring Redshift > driver: > http://docs.aws.amazon.com/redshift/latest/dg/queries-troubleshooting.html#set-the-JDBC-fetch-size-parameter > > <http://docs.aws.amazon.com/redshift/latest/dg/queries-troubleshooting.html#set-the-JDBC-fetch-size-parameter> > By default, the JDBC driver collects all the results for a query at one time. > As a result, when you attempt to retrieve a large result set over a JDBC > connection, you might encounter a client-side out-of-memory error. To enable > your client to retrieve result sets in batches instead of in a single > all-or-nothing fetch, set the JDBC fetch size parameter in your client > application. > > On Wed, Nov 29, 2017 at 1:41 PM Chet Aldrich <[email protected] > <mailto:[email protected]>> wrote: > Hey all, > > I’m running a Dataflow job that uses the JDBC IO transform to pull in a bunch > of data (20mm rows, for reference) from Redshift, and I’m noticing that I’m > getting an OutofMemoryError on the Dataflow workers once I reach around 4mm > rows. > > It seems like given the code that I’m reading inside JDBC IO and the guide > here > (https://beam.apache.org/documentation/io/authoring-overview/#read-transforms > <https://beam.apache.org/documentation/io/authoring-overview/#read-transforms>) > that it’s just pulling the data in from the result one-by-one and the > emitting each output. Considering that this is sort of a limitation of the > driver, this makes sense, but is there a way I can get around the memory > limitation somehow? It seems like Dataflow repeatedly tries to create more > workers to handle the work, but it can’t, which is part of the problem. > > If more info is needed in order to help me sort out what I could do to not > run into the memory limitations I’m happy to provide it. > > > Thanks, > > Chet
