Well I have seen that The algorithms mentioned are used for this. However some 
preprocessing through solr makes sense - it takes care of synonyms, homonyms, 
stemming etc

> On 07 Jun 2016, at 13:33, Mich Talebzadeh <[email protected]> wrote:
> 
> Thanks Jorn,
> 
> To start I would like to explore how can one turn some of the data into 
> useful information.
> 
> I would like to look at certain trend analysis. Simple correlation shows that 
> the more there is a mention of a typical topic say for example "organic food" 
> the more people are inclined to go for it. To see one can deduce that orgaind 
> food is a potential growth area. 
> 
> Now I have all infra-structure to ingest that data. Like using flume to store 
> it or Spark streaming to do near real time work.
> 
> Now I want to slice and dice that data for say organic food.
> 
> I presume this is a typical question.
> 
> You mentioned Spark ml (machine learning?) . Is that something viable?
> 
> Cheers
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 7 June 2016 at 12:22, Jörn Franke <[email protected]> wrote:
>> Spark ml Support Vector machines or neural networks could be candidates. 
>> For unstructured learning it could be clustering.
>> For doing a graph analysis On the followers you can easily use Spark Graphx
>> Keep in mind that each tweet contains a lot of meta data (location, 
>> followers etc) that is more or less structured.
>> For unstructured text analytics (eg tweet itself)I recommend 
>> solr/ElasticSearch .
>> 
>> However I am not sure what you want to do with the data exactly.
>> 
>> 
>>> On 07 Jun 2016, at 13:16, Mich Talebzadeh <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> This is really a general question.
>>> 
>>> I use Spark to get twitter data. I did some looking at it
>>> 
>>>     val ssc = new StreamingContext(sparkConf, Seconds(2))
>>>     val tweets = TwitterUtils.createStream(ssc, None)
>>>     val statuses = tweets.map(status => status.getText())
>>>     statuses.print()
>>> 
>>> Ok
>>> 
>>> Also I can use Apache flume to store data in hdfs directory
>>> 
>>> $FLUME_HOME/bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf 
>>> Dflume.root.logger=DEBUG,console -n TwitterAgent
>>> Now that stores twitter data in binary format in  hdfs directory.
>>> 
>>> My question is pretty basic.
>>> 
>>> What is the best tool/language to dif in to that data. For example twitter 
>>> streaming data. I am getting all sorts od stuff coming in. Say I am only 
>>> interested in certain topics like sport etc. How can I detect the signal 
>>> from the noise using what tool and language?
>>> 
>>> Thanks
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> http://talebzadehmich.wordpress.com
> 

Reply via email to