thanks Jorn.

Is the data stored in hdfs directory in binary format and can spark use it
or needs to convert into json etc. I am not familiar with the nature of the
twitter logs.

in short what tool I can use to convert the log files into useful format
and format would that be?

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 3 June 2016 at 09:40, Jörn Franke <jornfra...@gmail.com> wrote:

> Or combine both!  It is possible with Spark Streaming to combine streaming
> data and on HDFS. In the end it always depends what you want to do and when
> you need what.
>
> On 03 Jun 2016, at 10:26, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> I use twitter data with spark streaming to experiment with twitter data.
> Basic stuff
>
>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>     val tweets = TwitterUtils.createStream(ssc, None)
>     val statuses = tweets.map(status => status.getText())
>     statuses.print()
>
>
> Another alternative is to use Apache flume to get the twitter data and
> store it as log files in hdfs.
>
> <image.png>
>
>
> I notice that these log files are stored as binary log files.
>
> I presume the log files can be read and converted to json through another
> process or used with machine learning language.
>
> I know this question may not be directly relevant  but what are the main
> approaches, one real time analysis of twitter using spark streaming and the
> other store data in hdfs and use later.?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>

Reply via email to