Thanks. This library is only available with Spark 1.3. I am using version
1.2.1. Before I upgrade to 1.3, I want to try what can be done in 1.2.1.
So I am using following:
val MyDataset = sqlContext.sql("my select query”)
MyDataset.map(t =>
t(0)+"|"+t(1)+"|"+t(2)+"|"+t(3)+"|"+t(4)+"|"+t(5)).saveAsTextFile("/my_destination_path")
But it is giving following error:
15/03/24 17:05:51 ERROR Executor: Exception in task 1.0 in stage 13.0 (TID 106)
java.lang.NumberFormatException: For input string: ""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:453)
at java.lang.Long.parseLong(Long.java:483)
at
scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230)
is there something wrong with the TSTAMP field which is Long datatype?
Thanks & Regards
-----------------------
Ananda Basak
Ph: 425-213-7092
From: Yin Huai [mailto:[email protected]]
Sent: Monday, March 23, 2015 8:55 PM
To: BASAK, ANANDA
Cc: [email protected]
Subject: Re: Date and decimal datatype not working
To store to csv file, you can use
Spark-CSV<https://github.com/databricks/spark-csv> library.
On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA
<[email protected]<mailto:[email protected]>> wrote:
Thanks. This worked well as per your suggestions. I had to run following:
val TABLE_A =
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p =>
ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)),
BigDecimal(p(5)), BigDecimal(p(6))))
Now I am stuck at another step. I have run a SQL query, where I am Selecting
from all the fields with some where clause , TSTAMP filtered with date range
and order by TSTAMP clause. That is running fine.
Then I am trying to store the output in a CSV file. I am using
saveAsTextFile(“filename”) function. But it is giving error. Can you please
help me to write a proper syntax to store output in a CSV file?
Thanks & Regards
-----------------------
Ananda Basak
Ph: 425-213-7092<tel:425-213-7092>
From: BASAK, ANANDA
Sent: Tuesday, March 17, 2015 3:08 PM
To: Yin Huai
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Date and decimal datatype not working
Ok, thanks for the suggestions. Let me try and will confirm all.
Regards
Ananda
From: Yin Huai [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, March 17, 2015 3:04 PM
To: BASAK, ANANDA
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Date and decimal datatype not working
p(0) is a String. So, you need to explicitly convert it to a Long. e.g.
p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals value,
you need to create BigDecimal objects from your String values.
On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA
<[email protected]<mailto:[email protected]>> wrote:
Hi All,
I am very new in Spark world. Just started some test coding from last week. I
am using spark-1.2.1-bin-hadoop2.4 and scala coding.
I am having issues while using Date and decimal data types. Following is my
code that I am simply running on scala prompt. I am trying to define a table
and point that to my flat file containing raw data (pipe delimited format).
Once that is done, I will run some SQL queries and put the output data in to
another flat file with pipe delimited format.
*******************************************************
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
// Define row and table
case class ROW_A(
TSTAMP: Long,
USIDAN: String,
SECNT: Int,
SECT: String,
BLOCK_NUM: BigDecimal,
BLOCK_DEN: BigDecimal,
BLOCK_PCT: BigDecimal)
val TABLE_A =
sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p =>
ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))
TABLE_A.registerTempTable("TABLE_A")
***************************************************
The second last command is giving error, like following:
<console>:17: error: type mismatch;
found : String
required: Long
Looks like the content from my flat file are considered as String always and
not as Date or decimal. How can I make Spark to take them as Date or decimal
types?
Regards
Ananda