Thanks Jorn, To start I would like to explore how can one turn some of the data into useful information.
I would like to look at certain trend analysis. Simple correlation shows that the more there is a mention of a typical topic say for example "organic food" the more people are inclined to go for it. To see one can deduce that orgaind food is a potential growth area. Now I have all infra-structure to ingest that data. Like using flume to store it or Spark streaming to do near real time work. Now I want to slice and dice that data for say organic food. I presume this is a typical question. You mentioned Spark ml (machine learning?) . Is that something viable? Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 7 June 2016 at 12:22, Jörn Franke <jornfra...@gmail.com> wrote: > Spark ml Support Vector machines or neural networks could be candidates. > For unstructured learning it could be clustering. > For doing a graph analysis On the followers you can easily use Spark Graphx > Keep in mind that each tweet contains a lot of meta data (location, > followers etc) that is more or less structured. > For unstructured text analytics (eg tweet itself)I recommend > solr/ElasticSearch . > > However I am not sure what you want to do with the data exactly. > > > On 07 Jun 2016, at 13:16, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Hi, > > This is really a general question. > > I use Spark to get twitter data. I did some looking at it > > val ssc = new StreamingContext(sparkConf, Seconds(2)) > val tweets = TwitterUtils.createStream(ssc, None) > val statuses = tweets.map(status => status.getText()) > statuses.print() > > Ok > > Also I can use Apache flume to store data in hdfs directory > > $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf > Dflume.root.logger=DEBUG,console -n TwitterAgent > Now that stores twitter data in binary format in hdfs directory. > > My question is pretty basic. > > What is the best tool/language to dif in to that data. For example twitter > streaming data. I am getting all sorts od stuff coming in. Say I am only > interested in certain topics like sport etc. How can I detect the signal > from the noise using what tool and language? > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > >