Hi Dr Mich, how bout reading all csv as string and then applying an UDF sort of like this?
import scala.util.control.Exception.allCatch def getDouble(doubleStr:String):Double = allCatch opt doubleStr.toDouble match { case Some(doubleNum) => doubleNum case _ => Double.NaN } out of curiosity are you reading data from Yahoo Finance? if so, are you downloading a whole .csv file? i m doing similar thing but i am using instead a library from com.github.tototoshi.csv._ to read csv files as a list of string, then i have control on how to render each row..... but presumably if you have over 1k worth of data perhaps this solution will not assist hth marco On Wed, Sep 28, 2016 at 3:44 PM, Peter Figliozzi <pete.figlio...@gmail.com> wrote: > In Scala, x.isNaN returns true for Double.NaN, but false for any > character. I guess the `isnan` function you are using works by ultimately > looking at x.isNan. > > On Wed, Sep 28, 2016 at 5:56 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> >> This is an issue in most databases. Specifically if a field is NaN.. --> ( >> *NaN*, standing for not a number, is a numeric data type value >> representing an undefined or unrepresentable value, especially in >> floating-point calculations) >> >> There is a method called isnan() in Spark that is supposed to handle this >> scenario . However, it does not return correct values! For example I >> defined column "Open" as String (it should be Float) and it has the >> following 7 rogue entries out of 1272 rows in a csv >> >> df2.filter( $"OPen" === >> "-").select((changeToDate("TradeDate").as("TradeDate")), >> 'Open, 'High, 'Low, 'Close, 'Volume).show >> >> +----------+----+----+---+-----+------+ >> | TradeDate|Open|High|Low|Close|Volume| >> +----------+----+----+---+-----+------+ >> |2011-12-23| -| -| -|40.56| 0| >> |2011-04-21| -| -| -|45.85| 0| >> |2010-12-30| -| -| -|38.10| 0| >> |2010-12-23| -| -| -|38.36| 0| >> |2008-04-30| -| -| -|32.39| 0| >> |2008-04-29| -| -| -|33.05| 0| >> |2008-04-28| -| -| -|32.60| 0| >> +----------+----+----+---+-----+------+ >> >> However, the following does not work! >> >> df2.filter(isnan($"Open")).show >> +-----+------+---------+----+----+---+-----+------+ >> |Stock|Ticker|TradeDate|Open|High|Low|Close|Volume| >> +-----+------+---------+----+----+---+-----+------+ >> +-----+------+---------+----+----+---+-----+------+ >> >> Any suggestions? >> >> Thanks >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >