Hi all,
I am trying to calculate Histogram of all columns from a CSV file using
Spark Scala.
I found that DoubleRDDFunctions supporting Histogram.
So i coded like following for getting histogram of all columns.
1. Get column count
2. Create RDD[double] of each column and calculate Histogram of
*Hi devs,*
*Is there any connection between the input file size and RAM size for
sorting using SparkSQL ?*
*I tried 1 GB file with 8 GB RAM with 4 cores and got
java.lang.OutOfMemoryError: GC overhead limit exceeded.*
*Or could it be for any other reason ? Its working for other SparkSQL
operation
HI all,
I need a help.
When i am trying to run spark project it is showing that, "Exception in
thread "main" java.lang.SecurityException: class
"javax.servlet.ServletRegistration"'s signer information does not match
signer information of other classes in the same package".
*After deleting "/home/d
Rdd.coalesce(1) will coalesce RDD and give only one output file.
coalesce(2) will give 2 wise versa.
On Jan 23, 2015 4:58 AM, "Sean Owen" wrote:
> One output file is produced per partition. If you want fewer, use
> coalesce() before saving the RDD.
>
> On Thu, Jan 22, 2015 at 10:46 PM, Kane Kim
der to compute k-nearest
> neighbors locally. You can start with LSH + k-nearest in Google
> scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui
>
> On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. wrote:
> > Hi all,
> >
> > Please help me to find out best
Hi all,
Please help me to find out best way for K-nearest neighbor using spark for
large data sets.
Can you share your code ?
Devan M.S. | Research Associate | Cyber Security | AMRITA VISHWA
VIDYAPEETHAM | Amritapuri | Cell +919946535290 |
On Tue, Jan 20, 2015 at 5:03 PM, Xuelin Cao wrote:
>
> Hi,
>
> Yes, this is what I'm doing. I'm using hiveContext.h
Add one more library
libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.2.0"
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
repalce sqlContext with hiveContext. Its working while using HiveContext
for me.
Devan M.S. | Resea
Which context are you using HiveContext or SQLContext ? Can you try
with HiveContext
??
Devan M.S. | Research Associate | Cyber Security | AMRITA VISHWA
VIDYAPEETHAM | Amritapuri | Cell +919946535290 |
On Tue, Jan 20, 2015 at 3:49 PM, Xuelin Cao wrote:
>
> Hi, I'm using Spark 1
Hi all,
i have one large data-set. when i am getting the number of partitions its
showing 43.
We can't collect() the large data-set in to memory so i am thinking like
this, collect() each partitions so that it will be small in size.
Any thoughts ?
10 matches
Mail list logo