I have historical prices for various stocks.

Each csv file has 10 years trade one row per each day.

These are the columns defined in the class

case class columns(Stock: String, Ticker: String, TradeDate: String, Open:
Float, High: Float, Low: Float, Close: Float, Volume: Integer)

The issue is with Open, High, Low, Close columns that all are defined as
Float.

Most rows are OK like below but the red one with "-" defined as Float
causes issues

  Date     Open High  Low   Close Volume
27-Sep-16 80.91 80.93 79.87 80.85 1873158
23-Dec-11   -     -    -    40.56 0

Because the prices are defined as Float, these rows cause the application
to crash
scala> val rs = df2.filter(changeToDate("TradeDate") >=
monthsago).select((changeToDate("TradeDate").as("TradeDate")),(('Close+'Open)/2).as("AverageDailyPrice"),
'Low.as("Day's Low"), 'High.as("Day's High")).orderBy("TradeDate").collect
16/09/27 21:48:53 ERROR Executor: Exception in task 0.0 in stage 61.0 (TID
260)
java.lang.NumberFormatException: For input string: "-"


One way is to define the prices as Strings but that is not
meaningful. Alternatively do the clean up before putting csv in HDFS but
that becomes tedious and error prone.

Any ideas will be appreciated.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to