+1
On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen
wrote:
> +1
>
> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang wrote:
>
>> +1
>>
>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra
>> wrote:
>>
>>> +1
>>>
>>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
A second advantage is that it allows individual Executors to go into GC
pause (or even crash) and still allow other Executors to read shuffle data
and make progress, which tends to improve stability of memory-intensive
jobs.
On Thu, Jun 25, 2015 at 11:42 PM, Sandy Ryza
wrote:
> Hi Yash,
>
> One
Should we mention that you should synchronize
on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK to avoid a possible race
condition in cloning Hadoop Configuration objects prior to Hadoop 2.7.0? :)
On Wed, Mar 25, 2015 at 7:16 PM, Patrick Wendell wrote:
> Great - that's even easier. Maybe we could ha
The only issue I knew of with Java enums was that it does not appear in the
Scala documentation.
On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen wrote:
> Yeah the fully realized #4, which gets back the ability to use it in
> switch statements (? in Scala but not Java?) does end up being kind of
> hug
Out of curiosity, why could we not use Netty's SslHandler injected into the
TransportContext pipeline?
On Mon, Mar 16, 2015 at 7:56 PM, turp1twin wrote:
> Hey Patrick,
>
> Sorry for the delay, I was at Elastic{ON} last week and well, my day job
> has
> been keeping me busy... I went ahead and op
;>>>
>>>> sealed abstract class StorageLevel // cannot be a trait
>>>>
>>>> object StorageLevel {
>>>>private[this] case object _MemoryOnly extends StorageLevel
>>>>final val MemoryOnly: StorageLevel = _MemoryOnly
>>
is the
> > >> minimal code I found to make everything show up correctly in both
> > >> Scala and Java:
> > >>
> > >> sealed abstract class StorageLevel // cannot be a trait
> > >>
> > >> object StorageLevel {
> > >> private[this] case obje
one runs into the same problem I had.
> >>
> >> By setting --hadoop-major-version=2 when using the ec2 scripts,
> >> everything worked fine.
> >>
> >> Darin.
> >>
> >>
> >> - Original Message -
> >> From: Darin McBeath
&
agree with Aaron's suggestion.
> >
> > - Patrick
> >
> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson
> wrote:
> >> I'm cool with #4 as well, but make sure we dictate that the values
> should
> >> be defined within an object with the sa
I'm cool with #4 as well, but make sure we dictate that the values should
be defined within an object with the same name as the enumeration (like we
do for StorageLevel). Otherwise we may pollute a higher namespace.
e.g. we SHOULD do:
trait StorageLevel
object StorageLevel {
case object MemoryO
You might be seeing the result of this patch:
https://github.com/apache/spark/commit/d069c5d9d2f6ce06389ca2ddf0b3ae4db72c5797
which was introduced in 1.1.1. This patch disabled the ability for take()
to run without launching a Spark job, which means that the latency is
significantly increased for
For the specific question of supplementing Standalone Mode with a custom
leader election protocol, this was actually already committed in master and
will be available in Spark 1.3:
https://github.com/apache/spark/pull/771/files
You can specify spark.deploy.recoveryMode = "CUSTOM"
and spark.deploy
I think I've seen something like +2 = "strong LGTM" and +1 = "weak LGTM;
someone else should review" before. It's nice to have a shortcut which
isn't a sentence when talking about weaker forms of LGTM.
On Sat, Jan 17, 2015 at 6:59 PM, wrote:
> I think clarifying these semantics is definitely wor
This may be related: https://github.com/Parquet/parquet-mr/issues/211
Perhaps if we change our configuration settings for Parquet it would get
better, but the performance characteristics of Snappy are pretty bad here
under some circumstances.
On Tue, Sep 23, 2014 at 10:13 AM, Cody Koeninger wrot
; [image: Inline image 1]
>
>
> On Thu, Jul 24, 2014 at 6:09 PM, Aaron Davidson
> wrote:
>
>> Whoops, I was mistaken in my original post last year. By default, there
>> is one executor per node per Spark Context, as you said.
>> "spark.executor.memory" i
ne mode? This is after having closely
>>>>> read the documentation several times:
>>>>>
>>>>> *http://spark.apache.org/docs/latest/configuration.html
>>>>> <http://spark.apache.org/docs/latest/configuration.html>*
>>>>&g
This one is typically due to a mismatch between the Hadoop versions --
i.e., Spark is compiled against 1.0.4 but is running with 2.3.0 in the
classpath, or something like that. Not certain why you're seeing this with
spark-ec2, but I'm assuming this is related to the issues you posted in a
separate
age -
> > From: "Aaron Davidson"
> > To: dev@spark.apache.org
> > Sent: Monday, July 14, 2014 5:21:10 PM
> > Subject: Re: Profiling Spark tests with YourKit (or something else)
> >
> > Out of curiosity, what problems are you seeing with Utils.getCallSite?
One of the core problems here is the number of open streams we have, which
is (# cores * # reduce partitions), which can easily climb into the tens of
thousands for large jobs. This is a more general problem that we are
planning on fixing for our largest shuffles, as even moderate buffer sizes
can
> > >> > - Patrick
> > >> >
> > >> > On Mon, Jul 14, 2014 at 3:28 PM, Nishkam Ravi
> > >> wrote:
> > >> > > Hi Aaron, I'm not sure if synchronizing on an arbitrary lock
> object
> > >> would
> > >> >
Out of curiosity, what problems are you seeing with Utils.getCallSite?
On Mon, Jul 14, 2014 at 2:59 PM, Will Benton wrote:
> Thanks, Matei; I have also had some success with jmap and friends and will
> probably just stick with them!
>
>
> best,
> wb
>
>
> - Original Message -
> > From:
The full jstack would still be useful, but our current working theory is
that this is due to the fact that Configuration#loadDefaults goes through
every Configuration object that was ever created (via
Configuration.REGISTRY) and locks it, thus introducing a dependency from
new Configuration to old,
Agreed that the behavior of the Master killing off an Application when
Executors from the same set of nodes repeatedly die is silly. This can also
strike if a single node enters a state where any Executor created on it
quickly dies (e.g., a block device becomes faulty). This prevents the
Applicatio
Shark's in-memory format is already serialized (it's compressed and
column-based).
On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan
wrote:
> You are ignoring serde costs :-)
>
> - Mridul
>
> On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson wrote:
> > Tachyon
Tachyon should only be marginally less performant than memory_only, because
we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer
the data over a pipe from Tachyon; we can directly read from the buffers in
the same way that Shark reads from its in-memory columnar format.
On T
e is the log:
> >>
> >> E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor
> container
> >> for executor 20140616-104524-1694607552-5050-26919-1 of framework
> >> 20140702-102939-1694607552-5050-14846-: Not monitored
> >>
> >>
> >
Either Serializable works, scala Serializable extends Java's (originally
intended a common interface for people who didn't want to run Scala on a
JVM).
Class fields require the class be serialized along with the object to
access. If you declared "val n" inside a method's scope instead, though, we
Can you post the logs from any of the dying executors?
On Tue, Jul 1, 2014 at 1:25 AM, qingyang li
wrote:
> i am using mesos0.19 and spark0.9.0 , the mesos cluster is started, when I
> using spark-shell to submit one job, the tasks always lost. here is the
> log:
> --
> 14/07/01 16:24
I don't know of any way to avoid Akka doing a copy, but I would like to
mention that it's on the priority list to piggy-back only the map statuses
relevant to a particular map task on the task itself, thus reducing the
total amount of data sent over the wire by a factor of N for N physical
machines
There's some discussion here as well on just using the Scala REPL for 2.11:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-Scala-2-11-td6506.html#a6523
Matei's response mentions the features we needed to change from the Scala
REPL (class-based wrappers and where to output the g
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
was fixed: https://issues.apache.org/jira/browse/SPARK-1676
Prior to this fix, each Spark task created and cached its own FileSystems
due to a bug
No. Only 3 of the responses.
On Fri, May 16, 2014 at 10:38 AM, Nishkam Ravi wrote:
> Yes.
>
>
> On Fri, May 16, 2014 at 8:40 AM, DB Tsai wrote:
>
> > Yes.
> > On May 16, 2014 8:39 AM, "Andrew Or" wrote:
> >
> > > Apache has been having some problems lately. Do you guys see this
> > message?
>
It was, but due to the apache infra issues, some may not have received the
email yet...
On Fri, May 16, 2014 at 10:48 AM, Henry Saputra wrote:
> Hi Patrick,
>
> Just want to make sure that VOTE for rc6 also cancelled?
>
>
> Thanks,
>
> Henry
>
> On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell
>
I revert my changes. The test result is same.
> >
> > I touched the ReplSuite.scala file (use touch command), the test order
> is reversed, same as the very beginning. And the output is also the
> same.(The result in my first post).
> >
> >
> > --
> > Ye Xianjin
> > Sent
This may have something to do with running the tests on a Mac, as there is
a lot of File/URI/URL stuff going on in that test which may just have
happened to work if run on a Linux system (like Jenkins). Note that this
suite was added relatively recently:
https://github.com/apache/spark/pull/217
O
Matei's link seems to point to a specific starter project as part of the
starter list, but here is the list itself:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)
On Mon, Apr 7, 20
One solution for typesafe config is to use
"spark.speculation" = true
Typesafe will recognize the key as a string rather than a path, so the name
will actually be "\"spark.speculation\"", so you need to handle this
contingency when passing the config operations to spark (stripping the
quotes from
Should we try to deprecate these types of configs for 1.0.0? We can start
by accepting both and giving a warning if you use the old one, and then
actually remove them in the next minor release. I think
"spark.speculation.enabled=true" is better than "spark.speculation=true",
and if we decide to use
By the way, we still need to get our JIRAs migrated over to the Apache
system. Unrelated, just... saying.
On Mon, Feb 24, 2014 at 10:55 PM, Matei Zaharia wrote:
> This is probably a snafu because we had a GitHub hook that was sending
> messages to d...@spark.incubator.apache.org, and that list w
39 matches
Mail list logo