Hi All,
I have some avro data, which I am reading in the following way.
Query :
> val data = sc.newAPIHadoopFile(file,
classOf[AvroKeyInputFormat[GenericRecord]],
classOf[AvroKey[GenericRecord]], classOf[NullWritable]).
map(_._1.datum)
But, when I try to print the data, it is generating duplica
gt;
> https://github.com/databricks/spark-avro
>
>
> On Thu, Feb 11, 2016 at 10:38 PM, Anoop Shiralige <
> anoop.shiral...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am working with Spark 1.6.0 and pySpark shell specifically. I have an
>> JavaRDD[org.apa
Hi All,
I am working with Spark 1.6.0 and pySpark shell specifically. I have an
JavaRDD[org.apache.avro.GenericRecord] which I have converted to pythonRDD
in the following way.
javaRDD = sc._jvm.java.package.loadJson("path to data", sc._jsc)
javaPython = sc._jvm.SerDe.javaToPython(javaRDD)
from
Hi All,
I have written some functions in scala, which I want to expose in pyspark
(interactively, spark - 1.6.0).
The scala function(loadAvro) writtens a JavaRDD[AvroGenericRecord].
AvroGenericRecord is my wrapper class over the
/org.apache.avro.generic.GenericRecord/. I am trying to convert this
Hi All,
I am trying to do a comparison, by building the model locally using R and
on cluster using spark.
There is some difference in the results.
Any idea what is the internal implementation of Decision Tree in Spark
MLLib.. (ID3 or C4.5 or C5.0 or CART algorithm).
Thanks,
AnoopShiralige