I am not expert in Latent Semantic Indexing, but it is very simple to implement. Give a try, we will help
Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > On Jun 16, 2016, at 3:05 AM, Brice GOVIN <brice.go...@ensta-bretagne.org> > wrote: > > Information theory is about quantifying and qualifying the content of an > information in a data set. > Basically, it means that for a specific dataset I could say which data is > interesting or not (according to the algorithm I use). > > It is used in information retrieval (IR). > > I should have started with that maybe… > > Actually, I made mistake talking about information theory, it is more about > information retrieval (my bad..). However, with the > Moose-Algo-Information-Retrieval, I have only a set of words that is used in > documents but I would like to know if there was an effort on any algorithm to > qualify these words ? > There different kinds of model: > - set-theoretic model > - documents are represented as set of words or phrases and similarity derives > from the set-theoretic operations on those sets (I don’t understand so much > this one for now..) > - Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval > - algebraic model > - documents and queries are represented as vectors (or matrices or tuples) > and similarity is computed between query and document thanks to this > representation > - Common techniques are Latent Semantic Indexing (LSI), Vector Space Model > (several kinds) > - probabilistic model > - there is no particular representation for documents here. Similarity is > computed using the probability the document is relavant for the query > - Common techniques are Latent Dirichlet Allocation or others > > > I’m more about using an algebraic model and maybe is there something on > Latent Semantic Indexing? > > I’m not sure, I explained my thinking well … > > Regards, > -------------- > Brice Govin > PhD student in RMoD research team at INRIA Lille > Software Engineer at THALES AIR SYSTEMS Rungis > ENSTA-Bretagne ENSI2014 > 22 Avenue du General Leclerc 92340 BOURG-LA-REINE > >> On 15 Jun 2016, at 22:54, Alexandre Bergel <alexandre.ber...@me.com >> <mailto:alexandre.ber...@me.com>> wrote: >> >> Yes, tell us more. >> This is an interesting topic >> >> Alexandre >> -- >> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >> Alexandre Bergel http://www.bergel.eu <http://www.bergel.eu/> >> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. >> >> >> >>> On Jun 15, 2016, at 4:38 PM, stepharo <steph...@free.fr >>> <mailto:steph...@free.fr>> wrote: >>> >>> >>> >>> Le 15/6/16 à 19:03, Brice GOVIN a écrit : >>>> Hi, >>>> I'd like to know if someone did a work on information theory algorithm ? >>> >>> tell us more. >>> What is it? >>> >>>> I saw a package in Moose about information theory but it is just a kind of >>>> document indexation. >>>> >>>> Is there something more complete (quantities information)? >>>> >>>> Thanks, >>>> >>>> -------------- >>>> Brice Govin >>>> PhD student in RMoD research team at INRIA Lille >>>> Software Engineer at THALES AIR SYSTEMS Rungis >>>> ENSTA-Bretagne ENSI2014 >>>> 22 Avenue du General Leclerc 92340 BOURG-LA-REINE >>>> >>> >> >