I was kind of hoping that I would use Spark in this instance to generate
that intermediate SQL as part of its workflow strategy. Sort of as a
database independent way of doing my preprocessing.
Is there any way that allows me to capture the generated SQL from catalyst?
If so I would just use JDBCRd
Could you create a view of the table on your JDBC data source and just query
that from Spark?
Thanks,
Subhash
Sent from my iPhone
> On Mar 7, 2017, at 6:37 AM, El-Hassan Wanas wrote:
>
> As an example, this is basically what I'm doing:
>
> val myDF = originalDataFrame.select(col(column
As an example, this is basically what I'm doing:
val myDF =
originalDataFrame.select(col(columnName).when(col(columnName) ===
"foobar", 0).when(col(columnName) === "foobarbaz", 1))
Except there's much more columns and much more conditionals. The
generated Spark workflow starts with an
Can you provide some source code? I am not sure I understood the problem .
If you want to do a preprocessing at the JDBC datasource then you can write
your own data source. Additionally you may want to modify the sql statement to
extract the data in the right format and push some preprocessing to