Thanks Michael.
I realised that just checking for Volume > 0 should do
val rs = df2.filter($"Volume".cast("Integer") > 0)
will do,
Your point on
Again why not remove the rows where the volume of trades is 0?
Are you referring to below
scala> val rs = df2.filter($"Volume".cast("Integer") ===
t: tod...@yahoo-inc.com<mailto:tod...@yahoo-inc.com> is no longer with
Yahoo! (was: Re: Treadting NaN fields in Spark)
Date: September 29, 2016 at 10:56:10 AM CDT
To: mailto:msegel_had...@hotmail.com>>
This is an automatically generated message.
tod...@yahoo-inc.com<mailto:tod...
On Sep 29, 2016, at 10:29 AM, Mich Talebzadeh
mailto:mich.talebza...@gmail.com>> wrote:
Good points :) it took take "-" as a negative number -123456?
Yeah… you have to go down a level and start to remember that you’re dealing
with a stream or buffer of bytes below any casting.
At this moment
Good points :) it took take "-" as a negative number -123456?
At this moment in time this is what the code does
1. csv is imported into HDFS as is. No cleaning done for rogue columns
done at shell level
2. Spark programs does the following filtration:
3. val rs = df2.filter($"Open" !
"isnan" ends up using a case class, subclass of UnaryExpression, called
"IsNaN" which evaluates each row of the column like this:
- *False* if the value is Null
- Check the "Expression.Type" (apparently a Spark thing, not a Scala
thing.. still learning here)
- DoubleType: cast to Doub
Hi,
Just a few thoughts so take it for what its worth…
Databases have static schemas and will reject a row’s column on insert.
In your case… you have one data set where you have a column which is supposed
to be a number but you have it as a string.
You want to convert this to a double in your f
Hi Dr Mich,
how bout reading all csv as string and then applying an UDF sort of like
this?
import scala.util.control.Exception.allCatch
def getDouble(doubleStr:String):Double =
allCatch opt doubleStr.toDouble match {
case Some(doubleNum) => doubleNum
case _ => Double.NaN
}
In Scala, x.isNaN returns true for Double.NaN, but false for any
character. I guess the `isnan` function you are using works by ultimately
looking at x.isNan.
On Wed, Sep 28, 2016 at 5:56 AM, Mich Talebzadeh
wrote:
>
> This is an issue in most databases. Specifically if a field is NaN.. --> (
>
This is an issue in most databases. Specifically if a field is NaN.. --> (
*NaN*, standing for not a number, is a numeric data type value representing
an undefined or unrepresentable value, especially in floating-point
calculations)
There is a method called isnan() in Spark that is supposed to han