That's kinda annoying, but it's just a little extra boilerplate. Can you
call it as StorageLevel.DiskOnly() from Java? Would it also work if they
were case classes with empty constructors, without the field?
On Wed, Mar 4, 2015 at 11:35 PM, Xiangrui Meng wrote:
> `case object` inside an `object`
`case object` inside an `object` doesn't show up in Java. This is the
minimal code I found to make everything show up correctly in both
Scala and Java:
sealed abstract class StorageLevel // cannot be a trait
object StorageLevel {
private[this] case object _MemoryOnly extends StorageLevel
fina
Yep, that makes sense. Thanks for the clarification!
Mingyu
On 3/4/15, 8:05 PM, "Patrick Wendell" wrote:
>Yeah, it will result in a second serialized copy of the array (costing
>some memory). But the computational overhead should be very small. The
>absolute worst case here will be when doi
I am trying to read RDD avro, transform and write.
I am able to run it locally fine but when i run onto cluster, i see issues
with Avro.
export SPARK_HOME=/home/dvasthimal/spark/spark-1.0.2-bin-2.4.1
export SPARK_YARN_USER_ENV="CLASSPATH=/apache/hadoop/conf"
export HADOOP_CONF_DIR=/apache/hadoop/
Thanks for your reply, Evan.
> It may make sense to have a more general Gibbs sampling
> framework, but it might be good to have a few desired applications
> in mind (e.g. higher level models that rely on Gibbs) to help API
> design, parallelization strategy, etc.
I think I'm more interested in a
I like #4 as well and agree with Aaron's suggestion.
- Patrick
On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson wrote:
> I'm cool with #4 as well, but make sure we dictate that the values should
> be defined within an object with the same name as the enumeration (like we
> do for StorageLevel). Ot
Yeah, it will result in a second serialized copy of the array (costing
some memory). But the computational overhead should be very small. The
absolute worst case here will be when doing a collect() or something
similar that just bundles the entire partition.
- Patrick
On Wed, Mar 4, 2015 at 5:47
I'm cool with #4 as well, but make sure we dictate that the values should
be defined within an object with the same name as the enumeration (like we
do for StorageLevel). Otherwise we may pollute a higher namespace.
e.g. we SHOULD do:
trait StorageLevel
object StorageLevel {
case object MemoryO
The concern is really just the runtime overhead and memory footprint of
Java-serializing an already-serialized byte array again. We originally
noticed this when we were using RDD.toLocalIterator() which serializes the
entire 64MB partition. We worked around this issue by kryo-serializing and
snappy
#4 with a preference for CamelCaseEnums
On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley
wrote:
> another vote for #4
> People are already used to adding "()" in Java.
>
>
> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch wrote:
>
> > #4 but with MemoryOnly (more scala-like)
> >
> > http://docs.sc
another vote for #4
People are already used to adding "()" in Java.
On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch wrote:
> #4 but with MemoryOnly (more scala-like)
>
> http://docs.scala-lang.org/style/naming-conventions.html
>
> Constants, Values, Variable and Methods
>
> Constant names should
#4 but with MemoryOnly (more scala-like)
http://docs.scala-lang.org/style/naming-conventions.html
Constants, Values, Variable and Methods
Constant names should be in upper camel case. That is, if the member is
final, immutable and it belongs to a package object or an object, it may be
considered
Hi all,
There are many places where we use enum-like types in Spark, but in
different ways. Every approach has both pros and cons. I wonder
whether there should be an “official” approach for enum-like types in
Spark.
1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)
* All types sho
Hey Mingyu,
I think it's broken out separately so we can record the time taken to
serialize the result. Once we serializing it once, the second
serialization should be really simple since it's just wrapping
something that has already been turned into a byte buffer. Do you see
a specific issue with
Hi all,
It looks like the result of task is serialized twice, once by serializer (I.e.
Java/Kryo depending on configuration) and once again by closure serializer
(I.e. Java). To link the actual code,
The first one:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spar
I think we will have to fix
https://issues.apache.org/jira/browse/SPARK-5143 as well before the
final 1.3.x.
But yes everything else checks out for me, including sigs and hashes
and building the source release.
I have been following JIRA closely and am not aware of other blockers
besides the ones
the master and workers need some system and package updates, and i'll also
be rebooting the machines as well.
this shouldn't take very long to perform, and i expect jenkins to be back
up and building by 9am at the *latest*.
important note: i will NOT be updating jenkins or any of the plugins
dur
-1 (non-binding) because of SPARK-6144.
But aside from that I ran a set of tests on top of standalone and yarn
and things look good.
On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.3.0!
>
> The tag to be voted
Hey Marcelo,
Yes - I agree. That one trickled in just as I was packaging this RC.
However, I still put this out here to allow people to test the
existing fixes, etc.
- Patrick
On Wed, Mar 4, 2015 at 9:26 AM, Marcelo Vanzin wrote:
> I haven't tested the rc2 bits yet, but I'd consider
> https://i
I haven't tested the rc2 bits yet, but I'd consider
https://issues.apache.org/jira/browse/SPARK-6144 a serious regression
from 1.2 (since it affects existing "addFile()" functionality if the
URL is "hdfs:...").
Will test other parts separately.
On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell wro
Hi Manoj,
this question is best asked on the Spark mailing lists (copied). From a formal
point of view all
that counts is your proposal in Melange once applications start but your mentor
or the project you
wish to contribute to may have additional requirements.
Cheers,
Uli
On 2015-03-03 08:54
+1 (subject to comments on ec2 issues below)
machine 1: Macbook Air, OSX 10.10.2 (Yosemite), Java 8
machine 2: iMac, OSX 10.8.4, Java 7
1. mvn clean package -DskipTests (33min/13min)
2. ran SVM benchmark https://github.com/insidedctm/spark-mllib-benchmark
EC2 issues:
1) Unable to successfully
Hi, in the roadmap of Spark in 2015 (link:
http://files.meetup.com/3138542/Spark%20in%202015%20Talk%20-%20Wendell.p
ptx), I saw SchemaRDD is designed to be the basis of BOTH Spark
Streaming and Spark SQL.
My question is: what's the typical usage of SchemaRDD in a Spark
Streaming application? Thank
It is the LR over car-data at https://github.com/xsankar/cloaked-ironman.
1.2.0 gives Mean Squared Error = 40.8130551358
1.3.0 gives Mean Squared Error = 105.857603953
I will verify it one more time tomorrow.
Cheers
On Tue, Mar 3, 2015 at 11:28 PM, Xiangrui Meng wrote:
> On Tue, Mar 3, 2015 a
24 matches
Mail list logo