Hi,

I have a bunch of files stored in hdfs /unit_files directory in total
319 files

scala> val errlog = sc.textFile("/unix_files/*.ksh")

scala> errlog.filter(line => line.contains("sed"))count()
res104: Long = 1113

So it returns 1113 instances the word "sed"

If I want to see the collection I can do

SCALA>  ERRLOG.FILTER(LINE => LINE.CONTAINS("SED"))COLLECT()

res105: Array[String] = Array(" DSQUERY=${1} ; DBNAME=${2} ; ERROR=0 ;
PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in environment based on
argument for script., " exec sp_spaceused", " exec sp_spaceused",
PROGNAME=$(basename $0 | sed -e s/.ksh//), " BACKUPSERVER=$5 # Server
that is used to load the transaction dump", " BACKUPSERVER=$5 # Server
that is used to load the transaction dump", " BACKUPSERVER=$5 # Server
that is used to load the transaction dump", " cat
$TMPDIR/${DBNAME}_trandump.sql | sed s/${DSQUERY}/${REMOTESERVER}/ >
$TMPDIR/${DBNAME}_trandump.tmpsql", cat
$TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/
> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e 
> s/.ksh//), " B...
scala>

Now is there anyway I can retrieve all these instances or perhaps they
are all wrapped up and I only see few lines?

Thanks,

Mich

 

Reply via email to