Hi all,
I have been working with Spark for about 8 months. But it is not fully
learned by self-study. So I want to take a part-time job on a project.
Thus, I believe that it will both contribute to my own development and
benefit others. I *do not have any salary* anticipation.
Can you help me?
Hi !
I'm facing a Classloader problem using Spark 1.5.1
I use javax.validation and hibernate validation annotations on some of my beans
:
@NotBlank
@Valid
private String attribute1 ;
@Valid
private String attribute2 ;
When Spark tries to unmarshall these beans (after a remote RDD),
We have built an an ml platform, based on open source framework like
hadoop, spark, tensorflow. Now we need to give our product a wonderful
name, and eager for everyone's advice.
Any answers will be greatly appreciated.
Thanks.
No I did not, I thought Spark would take care of that itself since I have
put in the arguments.
On Thu, Sep 7, 2017 at 9:26 PM, Lukas Bradley
wrote:
> Did you also increase the size of the heap of the Java app that is
> starting Spark?
>
> https://alvinalexander.com/blog/post/java/java-xmx-xms-
You don't need multiple spark sessions to have more than one stream
working, but from maintenance and reliability perspective it is not good
idea.
On Thu, Sep 7, 2017 at 2:40 AM, kant kodali wrote:
> Hi All,
>
> I am wondering if it is ok to have multiple sparksession's in one spark
> structured
Thanks for your response Praneeth. We did consider Kafka however cost was
the only hold back factor as we might need a larger cluster and existing
cluster is on premise and my app is on cloud. So the same cluster cannot be
used.
But I agree it does sound like a good alternative.
Regards
Sunita
On
Severity: Medium
Vendor: The Apache Software Foundation
Versions Affected:
Versions of Apache Spark from 1.6.0 until 2.1.1
Description:
In Apache Spark 1.6.0 until 2.1.1, the launcher API performs unsafe
deserialization of data received by its socket. This makes applications
launched programmat
Hi all -
since upgrading to 2.2.0, we've noticed a significant increase in
read.parquet(...) ops. The parquet files are being read from S3. Upon
entry at the interactive terminal (pyspark in this case), the terminal
will sit "idle" for several minutes (as many as 10) before returning:
"17/
Can you provide a code sample please?
On Fri, Sep 8, 2017 at 5:44 PM, Matthew Anthony wrote:
> Hi all -
>
>
> since upgrading to 2.2.0, we've noticed a significant increase in
> read.parquet(...) ops. The parquet files are being read from S3. Upon entry
> at the interactive terminal (pyspark in
The code is as simple as calling `data = spark.read.parquet(address.)`. I can't give you the actual address I'm reading from for
security reasons. Is there something else I can provide? We're using
standard EMR images with Hive and Spark installed.
On 9/8/17 11:00 AM, Neil Jonkers wrote:
Can
Hi,
According to this thread https://issues.apache.org/jira/browse/SPARK-11374.
SPARK will not resolve the issue of skipping header option when the table
is defined in HIVE.
But I am unable to see a SPARK SQL option for setting up external
partitioned table.
Does that mean in case I have to crea
How can I use one SparkSession to talk to both Kafka and Cassandra let's
say?
On Fri, Sep 8, 2017 at 3:46 AM, Arkadiusz Bicz
wrote:
> You don't need multiple spark sessions to have more than one stream
> working, but from maintenance and reliability perspective it is not good
> idea.
>
> On Thu
On 7 Sep 2017, at 18:36, Mcclintic, Abbi
mailto:ab...@amazon.com>> wrote:
Thanks all – couple notes below.
Generally all our partitions are of equal size (ie on a normal day in this
particular case I see 10 equally sized partitions of 2.8 GB). We see the
problem with repartitioning and withou
Hi,
I am using Spark 1.6.1 and Yarn 2.7.4.
I want to submit a Spark application to a Yarn cluster. However, I found
that the number of vcores assigned to a container/executor is always 1,
even if I set spark.executor.cores=2. I also found the number of tasks an
executor runs concurrently is 2. So,
For posterity, I found the root cause and filed a JIRA:
https://issues.apache.org/jira/browse/SPARK-21960. I plan to open a pull
request with the minor fix.
From: Karthik Palaniappan
Sent: Friday, September 1, 2017 9:49 AM
To: Akhil Das
Cc: user@spark.apache.org;
Hi,Alonso.
Thanks! I've read about this but did not quite understand it. To pick out
the topic name of a kafka message seems a simple task but the example code
looks so complicated with redundent info. Why do we need offsetRanges here
and do we have a easy way to achieve this?
Cheers,
Dan
2017
You would set the Kafka topic as your data source and you would write a custom
output to Cassandra everything would be or could be contained within your
stream
-Paul
Sent from my iPhone
> On Sep 8, 2017, at 2:52 PM, kant kodali wrote:
>
> How can I use one SparkSession to talk to both Kafka
Hi,
Naga has kindly suggested here that I should push the file into RDD and get
rid of header. But my partitions have hundreds of files in it and just
opening and processing the files using RDD is a way old method of working.
I think that SPARK community has moved on from RDD, to Dataframes to
Dat
18 matches
Mail list logo