Well I have seen that The algorithms mentioned are used for this. However some preprocessing through solr makes sense - it takes care of synonyms, homonyms, stemming etc
> On 07 Jun 2016, at 13:33, Mich Talebzadeh <[email protected]> wrote: > > Thanks Jorn, > > To start I would like to explore how can one turn some of the data into > useful information. > > I would like to look at certain trend analysis. Simple correlation shows that > the more there is a mention of a typical topic say for example "organic food" > the more people are inclined to go for it. To see one can deduce that orgaind > food is a potential growth area. > > Now I have all infra-structure to ingest that data. Like using flume to store > it or Spark streaming to do near real time work. > > Now I want to slice and dice that data for say organic food. > > I presume this is a typical question. > > You mentioned Spark ml (machine learning?) . Is that something viable? > > Cheers > > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 7 June 2016 at 12:22, Jörn Franke <[email protected]> wrote: >> Spark ml Support Vector machines or neural networks could be candidates. >> For unstructured learning it could be clustering. >> For doing a graph analysis On the followers you can easily use Spark Graphx >> Keep in mind that each tweet contains a lot of meta data (location, >> followers etc) that is more or less structured. >> For unstructured text analytics (eg tweet itself)I recommend >> solr/ElasticSearch . >> >> However I am not sure what you want to do with the data exactly. >> >> >>> On 07 Jun 2016, at 13:16, Mich Talebzadeh <[email protected]> wrote: >>> >>> Hi, >>> >>> This is really a general question. >>> >>> I use Spark to get twitter data. I did some looking at it >>> >>> val ssc = new StreamingContext(sparkConf, Seconds(2)) >>> val tweets = TwitterUtils.createStream(ssc, None) >>> val statuses = tweets.map(status => status.getText()) >>> statuses.print() >>> >>> Ok >>> >>> Also I can use Apache flume to store data in hdfs directory >>> >>> $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf >>> Dflume.root.logger=DEBUG,console -n TwitterAgent >>> Now that stores twitter data in binary format in hdfs directory. >>> >>> My question is pretty basic. >>> >>> What is the best tool/language to dif in to that data. For example twitter >>> streaming data. I am getting all sorts od stuff coming in. Say I am only >>> interested in certain topics like sport etc. How can I detect the signal >>> from the noise using what tool and language? >>> >>> Thanks >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> http://talebzadehmich.wordpress.com >
