Hi,

This is really a general question.

I use Spark to get twitter data. I did some looking at it

    val ssc = new StreamingContext(sparkConf, Seconds(2))
    val tweets = TwitterUtils.createStream(ssc, None)
    val statuses = tweets.map(status => status.getText())
    statuses.print()

Ok

Also I can use Apache flume to store data in hdfs directory

$FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf
Dflume.root.logger=DEBUG,console -n TwitterAgent
Now that stores twitter data in binary format in  hdfs directory.

My question is pretty basic.

What is the best tool/language to dif in to that data. For example twitter
streaming data. I am getting all sorts od stuff coming in. Say I am only
interested in certain topics like sport etc. How can I detect the signal
from the noise using what tool and language?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to