We are looking at the issue and will likely fix it for Spark 1.3.1.
On Thu, Mar 12, 2015 at 8:25 PM, giive chen wrote:
> Hi all
>
> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
> parquet file generated by Spark 1.1. It will cost a lot of migration work
> when we wan
Thanks for chiming in, John. I missed your meetup last night - do you have
any writeups or slides about roofline design? In particular, I'm curious
about what optimizations are available for power-law dense * sparse? (I
don't have any background in optimizations)
On Thu, Mar 12, 2015 at 8:50 PM,
If you're contemplating GPU acceleration in Spark, its important to look
beyond BLAS. Dense BLAS probably account for only 10% of the cycles in the
datasets we've tested in BIDMach, and we've tried to make them
representative of industry machine learning workloads. Unless you're
crunching images or
Hi all
My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
parquet file generated by Spark 1.1. It will cost a lot of migration work
when we wanna to upgrade Spark 1.3.
Is there anyone can help me?
Thanks
Wisely Chen
On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee wrote
The checks against maxCategories are not for statistical purposes; they are
to make sure communication does not blow up. There currently are not
checks to make sure that there are enough entries for statistically
significant results. That is up to the user.
I do like the idea of adding a warning
Hi,
I am working on artificial neural networks for Spark. It is solved with
Gradient Descent, so each step the data is read, sum of gradients is calculated
for each data partition (on each worker), aggregated (on the driver) and
broadcasted back. I noticed that the gradient computation time is
Hi all,
I'm running the teraSort benchmark with a relative small input set: 5GB.
During profiling, I can see I am using a total of 68GB. I've got a terabyte
of memory in my system, and set
spark.executor.memory 900g
spark.driver.memory 900g
I use the default for
spark.shuffle.memoryFraction
spar
the big 1.3 push is over, so i'll be reclaiming these three extra workers.
:)
On Mon, Feb 9, 2015 at 5:18 PM, shane knapp wrote:
> ...to help w/the build backlog. let's all welcome
> amp-jenkins-slave-{01..03} back to the fray!
>
I have run some BLAS comparison benchmarks on different EC2 instance sizes
and also on NERSC super computers. I can put together a github-backed
website where we can host latest benchmark results and update them over
time.
Sam -- Does that sound like what you had in mind ?
Thanks
Shivaram
On Tue
Hi everyone!
I am digging into MLlib of Spark 1.2.1 currently. When reading codes of
MLlib.stat.test, in the file ChiSqTest.scala under
/spark/mllib/src/main/scala/org/apache/spark/mllib/stat/test, I am confused
by the usage of mapPartitions API in the function
def chiSquaredFeatures(data: RDD[La
10 matches
Mail list logo