Hi all!, I have a .txt file where each row of it it¹s a collection of terms of a document separated by space. For example:
1 "Hola spark² 2 .. I followed this example of spark site https://spark.apache.org/docs/latest/mllib-feature-extraction.html and i get something like this: tfidf.first() org.apache.spark.mllib.linalg.Vector = (1048576,[35587,884670],[3.458767233,3.458767233]) I think this: 1. First parameter ³1048576² i don¹t know what it is but always it´s the same number (maybe the number of terms). 2. Second parameter ³[35587,884670]² i think are the terms of the first line in my .txt file. 3. Third parameter ³[3.458767233,3.458767233]² i think are the tfidf values for my terms. Anyone knows the exact interpretation of this and in the second point if these values are the terms, how can i match this values with the original terms values (³[35587=>Hola,884670=>spark]²)?. Regards and thanks in advance. Franco Barrientos Data Scientist Málaga #115, Of. 1003, Las Condes. Santiago, Chile. (+562)-29699649 (+569)-76347893 franco.barrien...@exalitica.com <mailto:franco.barrien...@exalitica.com> www.exalitica.com <http://www.exalitica.com/>