For JDBC to work, you can start spark-submit with appropriate jdbc driver
jars (using --jars), then you will have the driver available on executors.
For acquiring connections, create a singleton connection per executor. I
think dataframe's jdbc reader (sqlContext.read.jdbc) already take care of
it
So it worked quite well with a coalesce, I was able to find an solution to
my problem : Altough not directly handling the executor a good roundaway
was to assign the desired partition to a specific stream through assign
strategy and coalesce to a single partition then repeat the same process
for th
Hello all,
I have a couple of spark streaming questions. Thanks.
1. In the case of stateful operations, the data is, by default,
persistent in memory.
In memory does it mean MEMORY_ONLY? When is it removed from memory?
2. I do not see any documentation for spark.cleaner.ttl. Is this no
l
see:
https://issues.apache.org/jira/browse/SPARK-18122
On Tue, Mar 21, 2017 at 1:13 PM, Ashic Mahtab wrote:
> I'm trying to easily create custom encoders for case classes having
> "unfriendly" fields. I could just kryo the whole thing, but would like to
> at least have a few fields in the schema
I am using spark to dump data from mysql into hdfs.
The way I am doing this is by creating a spark dataframe with the metadata
of different mysql tables to dump from multiple mysql hosts and then
running a map over that data frame to dump each mysql table data into hdfs
inside the executor.
The re
What is your use case? I am sure there must be a better way to solve it
On Wed, Mar 22, 2017 at 9:34 AM, Shashank Mandil
wrote:
> Hi All,
>
> I am using spark in a yarn cluster mode.
> When I run a yarn application it creates multiple executors on the hadoop
> datanodes for processing.
>
> I
Hi All,
I am using spark in a yarn cluster mode.
When I run a yarn application it creates multiple executors on the hadoop
datanodes for processing.
Is it possible for me to create a local spark context (master=local) on
these executors to be able to get a spark context ?
Theoretically since eac
Hi guys,
I want transform Row using NamedExpression.
Below is the code snipped that I am using :
def apply(dataFrame: DataFrame, selectExpressions:
java.util.List[String]): RDD[UnsafeRow] = {
val exprArray = selectExpressions.map(s =>
Column(SqlParser.parseExpression(s)).named
)
Hi All,
I have a spark data frame which has 992 rows inside it.
When I run a map on this data frame I expect that the map should work for
all the 992 rows.
As a mapper runs on an executor on a cluster I did a distributed count of
the number of rows the mapper is being run on.
dataframe.map(r =>
Hi,
In a context of ugly data, I am trying to find an efficient way to parse a
kafka stream of CSV lines into a clean data model and route lines in error
in a specific topic.
Generally I do this:
1. First a map to split my lines with the separator character (";")
2. Then a filter where I put all m
Thanks TD.
On Tue, Mar 14, 2017 at 4:37 PM, Tathagata Das wrote:
> This setting allows multiple spark jobs generated through multiple
> foreachRDD to run concurrently, even if they are across batches. So output
> op2 from batch X, can run concurrently with op1 of batch X+1
> This is not safe bec
I'm trying to easily create custom encoders for case classes having
"unfriendly" fields. I could just kryo the whole thing, but would like to at
least have a few fields in the schema instead of one binary blob. For example,
case class MyClass(id: UUID, items: Map[String, Double], name: String)
You could create a one-time job that processes historical data to match the
updated format
On Tue, Mar 21, 2017 at 8:53 AM, Aditya Borde wrote:
> Hello,
>
> I'm currently blocked with this issue:
>
> I have job "A" whose output is partitioned by one of the field - "col1"
> Now job "B" reads the
Hello,
I'm currently blocked with this issue:
I have job "A" whose output is partitioned by one of the field - "col1"
Now job "B" reads the output of job "A".
Here comes the problem. my job "A" output previously not been partitioned
by "col1" (this is recent change).
But the thing is now, all my
Hi Chetan,
Sadly, you can not; Spark is configured to ignore the null values when
writing JSON. (check JacksonMessageWriter and find
JsonInclude.Include.NON_NULL from the code.) If you want that
functionality, it would be much better to file the problem to JIRA.
Best,
Dongjin
On Mon, Mar 20, 201
15 matches
Mail list logo