Hi,
I am seeing perf degradation in the Spark/Pi example on a single-node
setup (using local[K])
Using 1, 2, 4, and 8 cores, this is the execution time in seconds for
the same number of iterations:-
Random: 4.0, 7.0, 12.96, 17.96
If I change the code to use ThreadLocalRandom
(https://github.com/
It's already there isn't it? The in-memory columnar cache format.
On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal wrote:
> Hi,
>
> Do we have any plan of supporting parquet-like partitioning support in
> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
> in-memory cache partition
Hi,
Do we have any plan of supporting parquet-like partitioning support in
Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
in-memory cache partition.
-Nitin
Hi Maciej,
Thanks again for the reply. Once small clarification about the answer about
my #1 point.
I put local[4] and shouldn't this be forcing spark to read from 4
partitions in parallel and write in parallel (by parallel I mean, the order
from which partition, the data is read from a set of 4 p
Sehr Port forwarding will help you out.
marco rocchi schrieb am Do. 24. Nov.
2016 um 16:33:
> Hi,
> I'm working with Apache Spark in order to develop my master thesis.I'm new
> in spark and working with cluster. I searched through internet but I didn't
> found a way to solve.
> My problem is the
Hi,
I'm working with Apache Spark in order to develop my master thesis.I'm new
in spark and working with cluster. I searched through internet but I didn't
found a way to solve.
My problem is the following one: from my pc I can access to a master node
of a cluster only via proxy.
To connect to proxy
Besides the traffic eventual issue, I don't believe that it would benefit
users to get a standalone site. Some great answers are provided by users
that aren't spark experts but maybe java, python, aws or even some system
experts why do we want to play alone ?
We are trying nevertheless the animat
…my 0.1 cent ☺
As a Spark and SO user, I would not find a separate SE a good thing.
*Part of the SO beauty is that you can filter easily and track different topics
from one dashboard.
*Being part of SO also gets good exposure as it raises awareness of Spark
across a wider audience.
*High rep
Here's a view into the requirements, for example:
http://area51.stackexchange.com/proposals/76571/emacs
You're right there is a lot of activity on SO, easily 30-40 questions per
day. One thing I noticed about, for example, the Data Science SE is that
most questions relevant to it were still posted
I am not sure what is enough traffic. Some of the SE groups already existing do
not have that much traffic.
Specifically the user mailing list has ~50 emails per day. It wouldn’t be much
of a stretch to extract 1-2 questions per day from that. In the regular
stackoverflow the apache-spark had
I don't think there's nearly enough traffic to sustain a stand-alone SE. I
helped mod the Data Science SE and it's still not technically critical mass
after 2 years. It would just fracture the discussion to yet another place.
On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson
wrote:
> Sorry to reaw
12 matches
Mail list logo