Re: Splitting columns from a text file

Ashok Kumar Mon, 05 Sep 2016 08:22:30 -0700

Thanks everyone.
I am not skilled like you gentlemen
This is what I did
1) Read the text file
val textFile = sc.textFile("/tmp/myfile.txt")


2) That produces an RDD of String.
3) Create a DF after splitting the file into an Array 
val df = textFile.map(line => 
line.split(",")).map(x=>(x(0).toInt,x(1).toString,x(2).toDouble)).toDF
4) Create a class for column headers
 case class Columns(col1: Int, col2: String, col3: Double)
5) Assign the column headers 
val h = df.map(p => Columns(p(0).toString.toInt, p(1).toString, 
p(2).toString.toDouble))
6) Only interested in column 3 > 50
 h.filter(col("Col3") > 50.0)
7) Now I just want Col3 only
h.filter(col("Col3") > 50.0).select("col3").show(5)+-----------------+|         
    
col3|+-----------------+|95.42536350467836||61.56297588648554||76.73982017179868||68.86218120274728||67.64613810115105|+-----------------+only
 showing top 5 rows
Does that make sense. Are there shorter ways gurus? Can I just do all this on 
RDD without DF?
Thanking you




 

    On Monday, 5 September 2016, 15:19, ayan guha <guha.a...@gmail.com> wrote:
 

 Then, You need to refer third term in the array, convert it to your desired 
data type and then use filter. 

On Tue, Sep 6, 2016 at 12:14 AM, Ashok Kumar <ashok34...@yahoo.com> wrote:

Hi,I want to filter them for values.
This is what is in array
74,20160905-133143,98. 11218069128827594148

I want to filter anything > 50.0 in the third column
Thanks

 

    On Monday, 5 September 2016, 15:07, ayan guha <guha.a...@gmail.com> wrote:
 

 Hi
x.split returns an array. So, after first map, you will get RDD of arrays. What 
is your expected outcome of 2nd map? 
On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar <ashok34...@yahoo.com.invalid> 
wrote:

Thank you sir.
This is what I get
scala> textFile.map(x=> x.split(","))res52: org.apache.spark.rdd.RDD[ 
Array[String]] = MapPartitionsRDD[27] at map at <console>:27
How can I work on individual columns. I understand they are strings
scala> textFile.map(x=> x.split(",")).map(x => (x.getString(0))     | 
)<console>:27: error: value getString is not a member of Array[String]       
textFile.map(x=> x.split(",")).map(x => (x.getString(0))
regards

 

    On Monday, 5 September 2016, 13:51, Somasundaram Sekar <somasundar.sekar@ 
tigeranalytics.com> wrote:
 

 Basic error, you get back an RDD on transformations like 
map.sc.textFile("filename").map(x => x.split(",") 
On 5 Sep 2016 6:19 pm, "Ashok Kumar" <ashok34...@yahoo.com.invalid> wrote:

Hi,
I have a text file as below that I read in
74,20160905-133143,98. 1121806912882759414875,20160905-133143,49. 
5277699881591680774276,20160905-133143,56. 
0802995712398098455677,20160905-133143,46. 
6368952654440752277778,20160905-133143,84. 
8822714116440218155179,20160905-133143,68. 72408602520662115000
val textFile = sc.textFile("/tmp/mytextfile. txt")
Now I want to split the rows separated by ","
scala> textFile.map(x=>x.toString). split(",")<console>:27: error: value split 
is not a member of org.apache.spark.rdd.RDD[ String]       
textFile.map(x=>x.toString). split(",")
However, the above throws error?
Any ideas what is wrong or how I can do this if I can avoid converting it to 
String?
Thanking



   



-- 
Best Regards,
Ayan Guha


   



-- 
Best Regards,
Ayan Guha

Re: Splitting columns from a text file

Reply via email to