Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by
val df = sparkSession.read.format("jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")....load() df.createOrReplaceTempView("new_table") Then perform custom operation do the trick. sparkSession.sql("select id, url from new_table").as[(String, String)].map { case (id, url) => val derived_data = ... // operation on url (id, derived_data) }.show() Thanks for the advice, it's really helpful! ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On August 7, 2018 5:33 PM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi James, > > It is always advisable to use the latest SPARK version. That said, can you > please giving a try to dataframes and udf if possible. I think, that would be > a much scalable way to address the issue. > > Also in case possible, it is always advisable to use the filter option before > fetching the data to Spark. > > Thanks and Regards, > Gourav > > On Tue, Aug 7, 2018 at 4:09 PM, James Starks <suse...@protonmail.com.invalid> > wrote: > >> I am very new to Spark. Just successfully setup Spark SQL connecting to >> postgresql database, and am able to display table with code >> >> sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show() >> >> Now I want to perform filter and map function on col_b value. In plain scala >> it would be something like >> >> Seq((1, "http://a.com/a"), (2, "http://b.com/b"), (3, "unknown")).filter >> { case (_, url) => isValid(url) }.map { case (id, url) => (id, pathOf(url)) >> } >> >> where filter will remove invalid url, and then map (id, url) to (id, path of >> url). >> >> However, when applying this concept to spark sql with code snippet >> >> sparkSession.sql("...").filter(isValid($"url")) >> >> Compiler complains type mismatch because $"url" is ColumnName type. How can >> I extract column value i.e. http://... for the column url in order to >> perform filter function? >> >> Thanks >> >> Java 1.8.0 >> Scala 2.11.8 >> Spark 2.1.0