DOH Looks like I did not have enough coffee before I asked this :-) I added the if statement...var demoRddFilter = demoRdd.filter(line => !line.contains("ISR$CASE$I_F_COD$FOLL_SEQ") || !line.contains("primaryid$caseid$caseversion")) var demoRddFilterMap = demoRddFilter.map(line => { if (line.split('$').length >= 13){ line.split('$')(0) + "~" + line.split('$')(5) + "~" + line.split('$')(11) + "~" + line.split('$')(12) } })
From: Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID> To: "user@spark.apache.org" <user@spark.apache.org> Sent: Wednesday, December 24, 2014 8:28 AM Subject: How to identify erroneous input record ? hey guys One of my input records has an problem that makes the code fail. var demoRddFilter = demoRdd.filter(line => !line.contains("ISR$CASE$I_F_COD$FOLL_SEQ") || !line.contains("primaryid$caseid$caseversion")) var demoRddFilterMap = demoRddFilter.map(line => line.split('$')(0) + "~" + line.split('$')(5) + "~" + line.split('$')(11) + "~" + line.split('$')(12))demoRddFilterMap.saveAsTextFile("/data/aers/msfx/demo/" + outFile) This is possibly happening because perhaps one input record may not have 13 fields.If this were Hadoop mapper code , I have 2 ways to solve this 1. test the number of fields of each line before applying the map function2. enclose the mapping function in a try catch block so that the mapping function only fails for the erroneous recordHow do I implement 1. or 2. in the Spark code ?Thanks sanjay #yiv8750085330 #yiv8750085330 -- filtered {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;}#yiv8750085330 filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}#yiv8750085330 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv8750085330 p.yiv8750085330MsoNormal, #yiv8750085330 li.yiv8750085330MsoNormal, #yiv8750085330 div.yiv8750085330MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;}#yiv8750085330 a:link, #yiv8750085330 span.yiv8750085330MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv8750085330 a:visited, #yiv8750085330 span.yiv8750085330MsoHyperlinkFollowed {color:#954F72;text-decoration:underline;}#yiv8750085330 p.yiv8750085330MsoListParagraph, #yiv8750085330 li.yiv8750085330MsoListParagraph, #yiv8750085330 div.yiv8750085330MsoListParagraph {margin-top:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:36.0pt;margin-bottom:.0001pt;font-size:11.0pt;}#yiv8750085330 span.yiv8750085330EstiloCorreo17 {color:windowtext;}#yiv8750085330 .yiv8750085330MsoChpDefault {}#yiv8750085330 filtered {margin:70.85pt 3.0cm 70.85pt 3.0cm;}#yiv8750085330 div.yiv8750085330WordSection1 {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 ol {margin-bottom:0cm;}#yiv8750085330 ul {margin-bottom:0cm;}#yiv8750085330