Use this package:

https://github.com/databricks/spark-csv

and change the delimiter to a tab.

The documentation is pretty straightforward, you'll get a Dataframe back
from the parser.

-Don

On Thu, Jun 25, 2015 at 4:39 AM, Ravikant Dindokar <ravikant.i...@gmail.com>
wrote:

> So I have a file where each line represents an edge in the graph & has two
> values separated by a tab. Both values are vertex id's (source and sink). I
> want to parse this file as dictionary in spark RDD.
> So my question is get these values in the form of dictionary in RDD?
> sample file :
> 1    2
> 1    5
> 2    3
>
> expected output : RDD (<1,2>,<1,5>,<2,3>)
>
> Thanks
> Ravikant
>
> On Thu, Jun 25, 2015 at 2:59 PM, anshu shukla <anshushuk...@gmail.com>
> wrote:
>
>> Can you be more specific Or can you provide sample file .
>>
>> On Thu, Jun 25, 2015 at 11:00 AM, Ravikant Dindokar <
>> ravikant.i...@gmail.com> wrote:
>>
>>> Hi Spark user,
>>>
>>> I am new to spark so forgive me for asking a basic question. I'm trying
>>> to import my tsv file into spark. This file has key and value separated by
>>> a \t per line. I want to import this file as dictionary of key value pairs
>>> in Spark.
>>>
>>> I came across this code to do the same for csv file:
>>>
>>> import csv
>>> import StringIO
>>> ...
>>> def loadRecord(line):
>>> """Parse a CSV line"""
>>>   input = StringIO.StringIO(line)
>>>   reader = csv.DictReader(input, fieldnames=["name",
>>> "favouriteAnimal"])
>>>   return reader.next()
>>> input = sc.textFile(inputFile).map(loadRecord)
>>>
>>> Can you point out the changes required to parse a tsv file?
>>>
>>> After following operation :
>>>
>>> split_lines = lines.map(_.split("\t"))
>>>
>>> what should I do to read the key values in dictionary?
>>>
>>>
>>> Thanks
>>>
>>> Ravikant
>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anshu Shukla
>>
>
>


-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
http://www.MailLaunder.com/
800-733-2143

Reply via email to