thanks Jorn. Is the data stored in hdfs directory in binary format and can spark use it or needs to convert into json etc. I am not familiar with the nature of the twitter logs.
in short what tool I can use to convert the log files into useful format and format would that be? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 3 June 2016 at 09:40, Jörn Franke <jornfra...@gmail.com> wrote: > Or combine both! It is possible with Spark Streaming to combine streaming > data and on HDFS. In the end it always depends what you want to do and when > you need what. > > On 03 Jun 2016, at 10:26, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > I use twitter data with spark streaming to experiment with twitter data. > Basic stuff > > val ssc = new StreamingContext(sparkConf, Seconds(2)) > val tweets = TwitterUtils.createStream(ssc, None) > val statuses = tweets.map(status => status.getText()) > statuses.print() > > > Another alternative is to use Apache flume to get the twitter data and > store it as log files in hdfs. > > <image.png> > > > I notice that these log files are stored as binary log files. > > I presume the log files can be read and converted to json through another > process or used with machine learning language. > > I know this question may not be directly relevant but what are the main > approaches, one real time analysis of twitter using spark streaming and the > other store data in hdfs and use later.? > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > >