Response to the 1st approach:
When you do spark.read.text("/xyz/a/b/filename") it returns a DataFrame and
when applying the rdd methods gives you a RDD[Row], so when you use map,
your function get Row as the parameter i.e; ip in your code. Therefore you
must use the Row methods to access its membe
It seems not an issue in Spark. Does "CSVParser" works fine without Spark
with the data?
BTW, it seems there is something wrong with your email address. I am
sending this again.
On 20 Sep 2016 8:32 a.m., "Hyukjin Kwon" wrote:
> It seems not an issue in Spark. Does "CSVParser" works fine without
It seems not an issue in Spark. Does "CSVParser" works fine without Spark
with the data?
On 20 Sep 2016 2:15 a.m., "Mohamed ismail"
wrote:
> Hi all
>
> I am trying to read:
>
> sc.textFile(DataFile).mapPartitions(lines => {
> val parser = new CSVParser(",")
>
wow, really weird. My intuition is the same as everyone else's, some
unprintable character. Here's a couple more debugging tricks I've used in
the past:
//set up an accumulator to catch the bad rows as a side-effect
val nBadRows = sc.accumulator(0)
val nGoodRows = sc.accumulator(0)
val badRows =
There could be some other character like a space or ^M etc. You could try
the following and see the actual row.
val newstream = datastream.map(row => {
try{
val strArray = str.trim().split(",")
(strArray(0).toInt, strArray(1).toInt)
//Instead try this
//*(strArray(0).trim(
Hi Yu,
Try this :
val data = csv.map( line => line.split(",").map(elem => elem.trim)) //lines
in rows
data.map( rec => (rec(0).toInt, rec(1).toInt))
to convert into integer.
On 16 December 2014 at 10:49, yu [via Apache Spark User List] <
ml-node+s1001560n20694...@n3.nabble.com> wrote:
>
> He
That certainly looks surprising. Are you sure there are no unprintable
characters in the file?
On Mon, Dec 15, 2014 at 9:49 PM, yu wrote:
> The exception info is:
> 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, h3): java.lang.NumberFormatException: For inpu