Thanks Jorn,

To start I would like to explore how can one turn some of the data into
useful information.

I would like to look at certain trend analysis. Simple correlation shows
that the more there is a mention of a typical topic say for example
"organic food" the more people are inclined to go for it. To see one can
deduce that orgaind food is a potential growth area.

Now I have all infra-structure to ingest that data. Like using flume to
store it or Spark streaming to do near real time work.

Now I want to slice and dice that data for say organic food.

I presume this is a typical question.

You mentioned Spark ml (machine learning?) . Is that something viable?

Cheers





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 12:22, Jörn Franke <jornfra...@gmail.com> wrote:

> Spark ml Support Vector machines or neural networks could be candidates.
> For unstructured learning it could be clustering.
> For doing a graph analysis On the followers you can easily use Spark Graphx
> Keep in mind that each tweet contains a lot of meta data (location,
> followers etc) that is more or less structured.
> For unstructured text analytics (eg tweet itself)I recommend
> solr/ElasticSearch .
>
> However I am not sure what you want to do with the data exactly.
>
>
> On 07 Jun 2016, at 13:16, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Hi,
>
> This is really a general question.
>
> I use Spark to get twitter data. I did some looking at it
>
>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>     val tweets = TwitterUtils.createStream(ssc, None)
>     val statuses = tweets.map(status => status.getText())
>     statuses.print()
>
> Ok
>
> Also I can use Apache flume to store data in hdfs directory
>
> $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf
> Dflume.root.logger=DEBUG,console -n TwitterAgent
> Now that stores twitter data in binary format in  hdfs directory.
>
> My question is pretty basic.
>
> What is the best tool/language to dif in to that data. For example twitter
> streaming data. I am getting all sorts od stuff coming in. Say I am only
> interested in certain topics like sport etc. How can I detect the signal
> from the noise using what tool and language?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>

Reply via email to