In my usage of MLLib's LDA, I have noticed that repeated invocations of
LDAModel.transform() result in the duplication of a matrix derived from the
model's topic matrix. Because this derived matrix can be quite large
(imagine hundreds of topics, and vocabulary size in the tens or hundreds of
thousa
I have an "unnamed" json array stored in a *column*.
The format is the following :
column name : news
Data :
[
{
"source": "source1",
"name": "News site1"
},
{
"source": "source2",
"name": "News site2"
}
]
Ideally , I'd like to parse it as :
news ARRAY>
I've tr
Hey, I am working with spark source code. I am printing logs within the code
to understand how hadoopRDD works. I wan't to print a timestamp when
executor first reads the textFile RDD (input source(URL) in form of hdfs). I
tried to print some logs in executor.scala but they do not display on the
ru
Also, Holden Karau conducts PR requests reviews and shows how you can
contribute to this communal project. Attend one of her live PR sessions.
Cheers
Jules
Sent from my iPhone
Pardon the dumb thumb typos :)
> On Feb 22, 2019, at 7:16 AM, Pooja Agrawal wrote:
>
> Hi,
>
> I am new to spark
Hi,
I am new to spark and want to start contributing to Apache spark to know
more about it.
I found this JIRA to have "Standardized Join Types for DataFrames", which I
feel could be a good starter task for me. I wanted to confirm if this is a
relevant/actionable task and if I can start working on
Hi!
We have dynamic allocation enabled for our regular jobs and sometimes they fail
with java.util.concurrent.TimeoutException: Futures timed out after [300
seconds]. Seems like spark driver starts broadcast just before the job has
received any executors from the YARN and if it takes more than