spark-1.4.1-bin-hadoop2.6")
.setAppResource("/home/user/MyCode/forSpark/wordcount.py").addPyFile("/home/andabe/MyCode/forSpark/wordcount.py")
.setMaster("myServerName")
.setAppName("pytho2word")
.launch();
println("finishing")
spark.waitFor();
println("finished")
Any help is appreciated.
Cheers,
Andrejs
Thanks Ted,
that helped me, it turned out that I wrongly formated the name of the
server, I had to add spark:// in front of server name.
Cheers,
Andrejs
On 11/11/15 14:26, Ted Yu wrote:
Please take a look
at launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java
to see how
hip, Butler County, Ohio, in
the United States. It is located about ten miles southwest of Hamilton on
Howards Creek, a tributary of the Great Miami River in section 28 of R1ET3N of
the Congress Lands. It is three miles west of Shandon and two miles south of
Okeana.", _metadata -> Map(_index -> dbpedia, _type -> docs, _id ->
AUy5aQs7895C6HE5GmG4, _score -> 0.0))
As you can see _score is 0.
Would appreciate any help,
Cheers,
Andrejs
Thank you for the information.
Cheers,
Andrejs
On 04/18/2015 10:23 AM, Nick Pentreath wrote:
> ES-hadoop uses a scan & scroll search to efficiently retrieve large
> result sets. Scores are not tracked in a scan and sorting is not
> supported hence 0 scores.
>
> http://www.
Hi,
I'm new to Mllib and spark. I'm trying to use tf-idf and use those values
for term ranking.
I'm getting tf values in vector format, but how can get the values of
vector?
val sc = new SparkContext(conf)
val documents: RDD[Seq[String]] =
sc.textFile("/home/andr
f
where calculated
Best regards,
Andrejs
I found my problem. I assumed based on TF-IDF in Wikipedia , that log base
10 is used, but as I found in this discussion
<https://groups.google.com/forum/#!topic/scala-language/K5tbYSYqQc8>, in
scala it is actually ln (natural logarithm).
Regards,
Andrejs
On Thu, Oct 30, 2014 at 10:49 PM,
Hi,
Can some one pleas sugest me, what is the best way to output spark data as
JSON file. (File where each line is a JSON object)
Cheers,
Andrejs