Thanks all. Finally I am able to run my code successfully. It is running in
Spark 1.2.1. I will try it on Spark 1.3 too.
The major cause of all errors I faced was that the delimiter was not correctly
declared.
val TABLE_A =
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p =>
ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))
Now I am using following and that solved most of the issues:
val Delimeter = "\\|"
val TABLE_A =
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split(Delimeter)).map(p
=> ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))
Thanks again. My first code ran successfully giving me some confidence, now I
will explore more.
Regards
Ananda
From: BASAK, ANANDA
Sent: Thursday, March 26, 2015 4:55 PM
To: Dean Wampler
Cc: Yin Huai; [email protected]
Subject: RE: Date and decimal datatype not working
Thanks all. I am installing Spark 1.3 now. Thought that I should better sync
with the daily evolution of this new technology.
So once I install that, I will try to use the Spark-CSV library.
Regards
Ananda
From: Dean Wampler [mailto:[email protected]]
Sent: Wednesday, March 25, 2015 1:17 PM
To: BASAK, ANANDA
Cc: Yin Huai; [email protected]<mailto:[email protected]>
Subject: Re: Date and decimal datatype not working
Recall that the input isn't actually read until to do something that forces
evaluation, like call saveAsTextFile. You didn't show the whole stack trace
here, but it probably occurred while parsing an input line where one of your
long fields is actually an empty string.
Because this is such a common problem, I usually define a "parse" method that
converts input text to the desired schema. It catches parse exceptions like
this and reports the bad line at least. If you can return a default long in
this case, say 0, that makes it easier to return something.
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd
Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe<http://typesafe.com>
@deanwampler<http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Wed, Mar 25, 2015 at 11:48 AM, BASAK, ANANDA
<[email protected]<mailto:[email protected]>> wrote:
Thanks. This library is only available with Spark 1.3. I am using version
1.2.1. Before I upgrade to 1.3, I want to try what can be done in 1.2.1.
So I am using following:
val MyDataset = sqlContext.sql("my select query”)
MyDataset.map(t =>
t(0)+"|"+t(1)+"|"+t(2)+"|"+t(3)+"|"+t(4)+"|"+t(5)).saveAsTextFile("/my_destination_path")
But it is giving following error:
15/03/24 17:05:51 ERROR Executor: Exception in task 1.0 in stage 13.0 (TID 106)
java.lang.NumberFormatException: For input string: ""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:453)
at java.lang.Long.parseLong(Long.java:483)
at
scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230)
is there something wrong with the TSTAMP field which is Long datatype?
Thanks & Regards
-----------------------
Ananda Basak
From: Yin Huai [mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, March 23, 2015 8:55 PM
To: BASAK, ANANDA
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Date and decimal datatype not working
To store to csv file, you can use
Spark-CSV<https://github.com/databricks/spark-csv> library.
On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA
<[email protected]<mailto:[email protected]>> wrote:
Thanks. This worked well as per your suggestions. I had to run following:
val TABLE_A =
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p =>
ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)),
BigDecimal(p(5)), BigDecimal(p(6))))
Now I am stuck at another step. I have run a SQL query, where I am Selecting
from all the fields with some where clause , TSTAMP filtered with date range
and order by TSTAMP clause. That is running fine.
Then I am trying to store the output in a CSV file. I am using
saveAsTextFile(“filename”) function. But it is giving error. Can you please
help me to write a proper syntax to store output in a CSV file?
Thanks & Regards
-----------------------
Ananda Basak
From: BASAK, ANANDA
Sent: Tuesday, March 17, 2015 3:08 PM
To: Yin Huai
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Date and decimal datatype not working
Ok, thanks for the suggestions. Let me try and will confirm all.
Regards
Ananda
From: Yin Huai [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, March 17, 2015 3:04 PM
To: BASAK, ANANDA
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Date and decimal datatype not working
p(0) is a String. So, you need to explicitly convert it to a Long. e.g.
p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals value,
you need to create BigDecimal objects from your String values.
On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA
<[email protected]<mailto:[email protected]>> wrote:
Hi All,
I am very new in Spark world. Just started some test coding from last week. I
am using spark-1.2.1-bin-hadoop2.4 and scala coding.
I am having issues while using Date and decimal data types. Following is my
code that I am simply running on scala prompt. I am trying to define a table
and point that to my flat file containing raw data (pipe delimited format).
Once that is done, I will run some SQL queries and put the output data in to
another flat file with pipe delimited format.
*******************************************************
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
// Define row and table
case class ROW_A(
TSTAMP: Long,
USIDAN: String,
SECNT: Int,
SECT: String,
BLOCK_NUM: BigDecimal,
BLOCK_DEN: BigDecimal,
BLOCK_PCT: BigDecimal)
val TABLE_A =
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p =>
ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))
TABLE_A.registerTempTable("TABLE_A")
***************************************************
The second last command is giving error, like following:
<console>:17: error: type mismatch;
found : String
required: Long
Looks like the content from my flat file are considered as String always and
not as Date or decimal. How can I make Spark to take them as Date or decimal
types?
Regards
Ananda