Use this package: https://github.com/databricks/spark-csv
and change the delimiter to a tab. The documentation is pretty straightforward, you'll get a Dataframe back from the parser. -Don On Thu, Jun 25, 2015 at 4:39 AM, Ravikant Dindokar <ravikant.i...@gmail.com> wrote: > So I have a file where each line represents an edge in the graph & has two > values separated by a tab. Both values are vertex id's (source and sink). I > want to parse this file as dictionary in spark RDD. > So my question is get these values in the form of dictionary in RDD? > sample file : > 1 2 > 1 5 > 2 3 > > expected output : RDD (<1,2>,<1,5>,<2,3>) > > Thanks > Ravikant > > On Thu, Jun 25, 2015 at 2:59 PM, anshu shukla <anshushuk...@gmail.com> > wrote: > >> Can you be more specific Or can you provide sample file . >> >> On Thu, Jun 25, 2015 at 11:00 AM, Ravikant Dindokar < >> ravikant.i...@gmail.com> wrote: >> >>> Hi Spark user, >>> >>> I am new to spark so forgive me for asking a basic question. I'm trying >>> to import my tsv file into spark. This file has key and value separated by >>> a \t per line. I want to import this file as dictionary of key value pairs >>> in Spark. >>> >>> I came across this code to do the same for csv file: >>> >>> import csv >>> import StringIO >>> ... >>> def loadRecord(line): >>> """Parse a CSV line""" >>> input = StringIO.StringIO(line) >>> reader = csv.DictReader(input, fieldnames=["name", >>> "favouriteAnimal"]) >>> return reader.next() >>> input = sc.textFile(inputFile).map(loadRecord) >>> >>> Can you point out the changes required to parse a tsv file? >>> >>> After following operation : >>> >>> split_lines = lines.map(_.split("\t")) >>> >>> what should I do to read the key values in dictionary? >>> >>> >>> Thanks >>> >>> Ravikant >>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Anshu Shukla >> > > -- Donald Drake Drake Consulting http://www.drakeconsulting.com/ http://www.MailLaunder.com/ 800-733-2143