Hi, This is really a general question.
I use Spark to get twitter data. I did some looking at it val ssc = new StreamingContext(sparkConf, Seconds(2)) val tweets = TwitterUtils.createStream(ssc, None) val statuses = tweets.map(status => status.getText()) statuses.print() Ok Also I can use Apache flume to store data in hdfs directory $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent Now that stores twitter data in binary format in hdfs directory. My question is pretty basic. What is the best tool/language to dif in to that data. For example twitter streaming data. I am getting all sorts od stuff coming in. Say I am only interested in certain topics like sport etc. How can I detect the signal from the noise using what tool and language? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com