Hi,
I have a bunch of files stored in hdfs /unit_files directory in total 319 files scala> val errlog = sc.textFile("/unix_files/*.ksh") scala> errlog.filter(line => line.contains("sed"))count() res104: Long = 1113 So it returns 1113 instances the word "sed" If I want to see the collection I can do SCALA> ERRLOG.FILTER(LINE => LINE.CONTAINS("SED"))COLLECT() res105: Array[String] = Array(" DSQUERY=${1} ; DBNAME=${2} ; ERROR=0 ; PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in environment based on argument for script., " exec sp_spaceused", " exec sp_spaceused", PROGNAME=$(basename $0 | sed -e s/.ksh//), " BACKUPSERVER=$5 # Server that is used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used to load the transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e > s/.ksh//), " B... scala> Now is there anyway I can retrieve all these instances or perhaps they are all wrapped up and I only see few lines? Thanks, Mich