data.filter(_.split("\t")(1) == "A2") ? -- Sean Owen | Director, Data Science | London
On Tue, Mar 4, 2014 at 1:06 PM, trottdw <trot...@gmail.com> wrote: > Hello, I am using Spark with Scala and I am attempting to understand the > different filtering and mapping capabilities available. I haven't found an > example of the specific task I would like to do. > > I am trying to read in a tab spaced text file and filter specific entries. > I would like this filter to be applied to different "columns" and not lines. > I was using the following to split the data but attempts to filter by > "column" afterwards are not working. > ----------------------------- > val data = sc.textFile("test_data.txt") > var parsedData = data.map( _.split("\t").map(_.toString)) > ------------------------------ > > To try to give a more concrete example of my goal, > Suppose the data file is: > A1 A2 A3 A4 > B1 B2 A3 A4 > C1 A2 C2 C3 > > > How would I filter the data based on the second column to only return those > entries which have A2 in column two? So, that the resulting RDD would just > be: > > A1 A2 A3 A4 > C1 A2 C2 C3 > > Is there a convenient way to do this? Any suggestions or assistance would > be appreciated. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Manipulation-in-Scala-tp2285.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.