Hi Chandeep 

Many thanks for your help 

In the line below 

errlog.filter(line => line.contains("sed"))collect()foreach(println) 

Can you please clarify the components with the correct naming as I am
new to Scala 

        * errlog --> is the RDD?
        * filter(line => line.contains("sed")) is a method
        * collect() is another method ?
        * foreach (println) ?

Thanks 

On 10/02/2016 21:28, Chandeep Singh wrote: 

> Hi Mich, 
> 
> If you would like to print everything to the console you could - 
> errlog.filter(line => line.contains("sed"))collect()foreach(println) 
> 
> or you could always save to a file using any of the saveAs methods. 
> 
> Thanks, 
> Chandeep 
> 
> On Wed, Feb 10, 2016 at 8:14 PM, 
> <mich.talebza...@cloudtechnologypartners.co.uk> wrote:
> 
>> Hi,
>> 
>> I have a bunch of files stored in hdfs /unit_files directory in total 319 
>> files
>> 
>> scala> val errlog = sc.textFile("/unix_files/*.ksh")
>> 
>> scala> errlog.filter(line => line.contains("sed"))count()
>> res104: Long = 1113
>> 
>> So it returns 1113 instances the word "sed"
>> 
>> If I want to see the collection I can do
>> 
>> SCALA> ERRLOG.FILTER(LINE => LINE.CONTAINS("SED"))COLLECT()
>> 
>> res105: Array[String] = Array(" DSQUERY=${1} ; DBNAME=${2} ; ERROR=0 ; 
>> PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in environment based on 
>> argument for script., " exec sp_spaceused", " exec sp_spaceused", 
>> PROGNAME=$(basename $0 | sed -e s/.ksh//), " BACKUPSERVER=$5 # Server that 
>> is used to load the transaction dump", " BACKUPSERVER=$5 # Server that is 
>> used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used 
>> to load the transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed 
>> s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat 
>> $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > 
>> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e 
>> s/.ksh//), " B...
>> scala>
>> 
>> Now is there anyway I can retrieve all these instances or perhaps they are 
>> all wrapped up and I only see few lines?
>> 
>> Thanks,
>> 
>> Mich

-- 

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential.
This message is for the designated recipient only, if you are not the
intended recipient, you should destroy it immediately. Any information
in this message shall not be understood as given or endorsed by Cloud
Technology Partners Ltd, its subsidiaries or their employees, unless
expressly so stated. It is the responsibility of the recipient to ensure
that this email is virus free, therefore neither Cloud Technology
partners Ltd, its subsidiaries nor their employees accept any
responsibility.

 

Reply via email to