Hi Mich,
If you would like to print everything to the console you could - errlog.
filter(line => line.contains("sed"))collect()foreach(println)
or you could always save to a file using any of the saveAs methods.
Thanks,
Chandeep
On Wed, Feb 10, 2016 at 8:14 PM, <
[email protected]> wrote:
>
>
> Hi,
>
> I have a bunch of files stored in hdfs /unit_files directory in total 319
> files
> scala> val errlog = sc.textFile("/unix_files/*.ksh")
>
> scala> errlog.filter(line => line.contains("sed"))count()
> res104: Long = 1113
> So it returns 1113 instances the word "sed"
>
> If I want to see the collection I can do
>
>
> *scala> errlog.filter(line => line.contains("sed"))collect()*
>
> res105: Array[String] = Array(" DSQUERY=${1} ;
> DBNAME=${2} ; ERROR=0 ; PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in
> environment based on argument for script., " exec sp_spaceused", "
> exec sp_spaceused", PROGNAME=$(basename $0 | sed -e s/.ksh//), "
> BACKUPSERVER=$5 # Server that is used to load the transaction dump", "
> BACKUPSERVER=$5 # Server that is used to load the transaction
> dump", " BACKUPSERVER=$5 # Server that is used to load the
> transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed
> s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat
> $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ >
> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e
> s/.ksh//), " B...
> scala>
>
>
> Now is there anyway I can retrieve all these instances or perhaps they are
> all wrapped up and I only see few lines?
>
> Thanks,
>
> Mich
>
>