Hi,
We occasionally encounter OutOfMemoryError errors when running Spark 3.1
with Java 17, G1 garbage collector (region size = 32MB), and a 200GB heap.
The OOM happens in the ShuffleExternalSorter when it attempts to allocate a
1GB array for the pointer array, despite having about 80GB of heap
ava
Hi All,
What is the most efficient way of converting static dataframe to streaming
(structured streaming)?
I have a custom sink implemented for structured streaming and I would like
to use it to write a static dataframe.
I know that I can write a dataframe to files and then source them to a
create
Hi folks,
What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it
no longer.
--
Oleksiy Dyagilev
Most likely you are missing an import statement that enables some Scala
implicits. I haven't used this connector, but looks like you need "import
com.couchbase.spark._"
--
Oleksiy Dyagilev
On Wed, Sep 7, 2016 at 9:42 AM, Devi P.V wrote:
> I am newbie in CouchBase.I am trying to write data into
It has 4 categories
a = 1 0 0
b = 0 0 0
c = 0 1 0
d = 0 0 1
--
Oleksiy Dyagilev
On Wed, Sep 7, 2016 at 10:42 AM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi,
>
> Any help on above mail use case ?
>
> Regards,
> Rajesh
>
> On Tue, Sep 6, 2016 at 5:40 PM, Madabhattula Rajesh Kumar
Okay, I think I found an answer on my question. Some models (for instance
org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs,
so just serializing these objects will not work.
--
Oleksiy Dyagilev
On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s wrote:
> What is the reason Sp
The local collection is distributed into the cluster when you run any
action http://spark.apache.org/docs/latest/programming-guide.html#actions
due to laziness of RDD.
If you want to control how the collection is split into parititions, you
can create your own RDD implementation and implement this
Correct.
It's desugared into rdd.foreach() by Scala compiler
--
Oleksiy Dyagilev
On Tue, Jul 12, 2016 at 6:58 PM, philipghu wrote:
> Hi,
>
> I'm new to Spark and Scala as well. I understand that we can use foreach to
> apply a function to each element of an RDD, like rdd.foreach
> (x=>println(
What is the reason Spark has an individual implementations of read/write
routines for every model in mllib and ml? (Saveable and MLWritable trait
impls)
Wouldn't a generic implementation via Java serialization mechanism work? I
would like to use it to store the models to a custom storage.
--
Olek
Nick, what is your use-case?
On Thu, Mar 24, 2016 at 11:55 PM, Marco Colombo wrote:
> You can persist off-heap, for example with tachyon, now called Alluxio.
> Take a look at off heap peristance
>
> Regards
>
>
> Il giovedì 24 marzo 2016, Holden Karau ha scritto:
>
>> Even checkpoint() is mayb
I guess because this example is stateless, so it outputs counts only for
given RDD. Take a look at stateful word counter
StatefulNetworkWordCount.scala
On Wed, Sep 24, 2014 at 4:29 AM, SK wrote:
>
> I execute it as follows:
>
> $SPARK_HOME/bin/spark-submit --master --class
> org.apache.spark
Hi,
I'm looking for available online ML algorithms (that improve model with new
streaming data). The only one I found is linear regression.
Is there anything else implemented as part of MLlib?
Thanks, Oleksiy.
12 matches
Mail list logo