DOH Looks like I did not have enough coffee before I asked this :-) I added the 
if statement...var demoRddFilter = demoRdd.filter(line => 
!line.contains("ISR$CASE$I_F_COD$FOLL_SEQ") || 
!line.contains("primaryid$caseid$caseversion"))
var demoRddFilterMap = demoRddFilter.map(line => {
  if (line.split('$').length >= 13){
    line.split('$')(0) + "~" + line.split('$')(5) + "~" + line.split('$')(11) + 
"~" + line.split('$')(12)
  }
})

      From: Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID>
 To: "user@spark.apache.org" <user@spark.apache.org> 
 Sent: Wednesday, December 24, 2014 8:28 AM
 Subject: How to identify erroneous input record ?
   
hey guys 
One of my input records has an problem that makes the code fail.
var demoRddFilter = demoRdd.filter(line => 
!line.contains("ISR$CASE$I_F_COD$FOLL_SEQ") || 
!line.contains("primaryid$caseid$caseversion"))

var demoRddFilterMap = demoRddFilter.map(line => line.split('$')(0) + "~" + 
line.split('$')(5) + "~" + line.split('$')(11) + "~" + 
line.split('$')(12))demoRddFilterMap.saveAsTextFile("/data/aers/msfx/demo/" + 
outFile)
This is possibly happening because perhaps one input record may not have 13 
fields.If this were Hadoop mapper code , I have 2 ways to solve this 1. test 
the number of fields of each line before applying the map function2. enclose 
the mapping function in a try catch block so that the mapping function only 
fails for the erroneous recordHow do I implement 1. or 2. in the Spark code 
?Thanks
sanjay

  #yiv8750085330 #yiv8750085330 -- filtered {font-family:Helvetica;panose-1:2 
11 6 4 2 2 2 2 2 4;}#yiv8750085330 filtered {panose-1:2 4 5 3 5 4 6 3 2 
4;}#yiv8750085330 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 
4;}#yiv8750085330 p.yiv8750085330MsoNormal, #yiv8750085330 
li.yiv8750085330MsoNormal, #yiv8750085330 div.yiv8750085330MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;}#yiv8750085330 a:link, 
#yiv8750085330 span.yiv8750085330MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv8750085330 a:visited, 
#yiv8750085330 span.yiv8750085330MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv8750085330 
p.yiv8750085330MsoListParagraph, #yiv8750085330 
li.yiv8750085330MsoListParagraph, #yiv8750085330 
div.yiv8750085330MsoListParagraph 
{margin-top:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:36.0pt;margin-bottom:.0001pt;font-size:11.0pt;}#yiv8750085330
 span.yiv8750085330EstiloCorreo17 {color:windowtext;}#yiv8750085330 
.yiv8750085330MsoChpDefault {}#yiv8750085330 filtered {margin:70.85pt 3.0cm 
70.85pt 3.0cm;}#yiv8750085330 div.yiv8750085330WordSection1 {}#yiv8750085330 
filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 
filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 
filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 
filtered {}#yiv8750085330 ol {margin-bottom:0cm;}#yiv8750085330 ul 
{margin-bottom:0cm;}#yiv8750085330 

  

Reply via email to