date:20170908

Part-time job

2017-09-08 Thread Uğur Sopaoğlu

Hi all, I have been working with Spark for about 8 months. But it is not fully learned by self-study. So I want to take a part-time job on a project. Thus, I believe that it will both contribute to my own development and benefit others. I *do not have any salary* anticipation. Can you help me?

[no subject]

2017-09-08 Thread PICARD Damien

Hi ! I'm facing a Classloader problem using Spark 1.5.1 I use javax.validation and hibernate validation annotations on some of my beans : @NotBlank @Valid private String attribute1 ; @Valid private String attribute2 ; When Spark tries to unmarshall these beans (after a remote RDD),

Wish you give our product a wonderful name

2017-09-08 Thread Jone Zhang

We have built an an ml platform, based on open source framework like hadoop, spark, tensorflow. Now we need to give our product a wonderful name, and eager for everyone's advice. Any answers will be greatly appreciated. Thanks.

Re: graphframe out of memory

2017-09-08 Thread Imran Rajjad

No I did not, I thought Spark would take care of that itself since I have put in the arguments. On Thu, Sep 7, 2017 at 9:26 PM, Lukas Bradley wrote: > Did you also increase the size of the heap of the Java app that is > starting Spark? > > https://alvinalexander.com/blog/post/java/java-xmx-xms-

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

2017-09-08 Thread Arkadiusz Bicz

You don't need multiple spark sessions to have more than one stream working, but from maintenance and reliability perspective it is not good idea. On Thu, Sep 7, 2017 at 2:40 AM, kant kodali wrote: > Hi All, > > I am wondering if it is ok to have multiple sparksession's in one spark > structured

Re: Chaining Spark Streaming Jobs

2017-09-08 Thread Sunita Arvind

Thanks for your response Praneeth. We did consider Kafka however cost was the only hold back factor as we might need a larger cluster and existing cluster is on premise and my app is on cloud. So the same cluster cannot be used. But I agree it does sound like a good alternative. Regards Sunita On

CVE-2017-12612 Unsafe deserialization in Apache Spark launcher API

2017-09-08 Thread Sean Owen

Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Versions of Apache Spark from 1.6.0 until 2.1.1 Description: In Apache Spark 1.6.0 until 2.1.1, the launcher API performs unsafe deserialization of data received by its socket. This makes applications launched programmat

[Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-08 Thread Matthew Anthony

Hi all - since upgrading to 2.2.0, we've noticed a significant increase in read.parquet(...) ops. The parquet files are being read from S3. Upon entry at the interactive terminal (pyspark in this case), the terminal will sit "idle" for several minutes (as many as 10) before returning: "17/

Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-08 Thread Neil Jonkers

Can you provide a code sample please? On Fri, Sep 8, 2017 at 5:44 PM, Matthew Anthony wrote: > Hi all - > > > since upgrading to 2.2.0, we've noticed a significant increase in > read.parquet(...) ops. The parquet files are being read from S3. Upon entry > at the interactive terminal (pyspark in

Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-08 Thread Matthew Anthony

The code is as simple as calling `data = spark.read.parquet(address.)`. I can't give you the actual address I'm reading from for security reasons. Is there something else I can provide? We're using standard EMR images with Hive and Spark installed. On 9/8/17 11:00 AM, Neil Jonkers wrote: Can

SPARK CSV ISSUE

2017-09-08 Thread Gourav Sengupta

Hi, According to this thread https://issues.apache.org/jira/browse/SPARK-11374. SPARK will not resolve the issue of skipping header option when the table is defined in HIVE. But I am unable to see a SPARK SQL option for setting up external partitioned table. Does that mean in case I have to crea

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

2017-09-08 Thread kant kodali

How can I use one SparkSession to talk to both Kafka and Cassandra let's say? On Fri, Sep 8, 2017 at 3:46 AM, Arkadiusz Bicz wrote: > You don't need multiple spark sessions to have more than one stream > working, but from maintenance and reliability perspective it is not good > idea. > > On Thu

Re: CSV write to S3 failing silently with partial completion

2017-09-08 Thread Steve Loughran

On 7 Sep 2017, at 18:36, Mcclintic, Abbi mailto:ab...@amazon.com>> wrote: Thanks all – couple notes below. Generally all our partitions are of equal size (ie on a normal day in this particular case I see 10 equally sized partitions of 2.8 GB). We see the problem with repartitioning and withou

Multiple vcores per container when running Spark applications in Yarn cluster mode

2017-09-08 Thread Xiaoye Sun

Hi, I am using Spark 1.6.1 and Yarn 2.7.4. I want to submit a Spark application to a Yarn cluster. However, I found that the number of vcores assigned to a container/executor is always 1, even if I set spark.executor.cores=2. I also found the number of tasks an executor runs concurrently is 2. So,

Re: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-09-08 Thread Karthik Palaniappan

For posterity, I found the root cause and filed a JIRA: https://issues.apache.org/jira/browse/SPARK-21960. I plan to open a pull request with the minor fix. From: Karthik Palaniappan Sent: Friday, September 1, 2017 9:49 AM To: Akhil Das Cc: user@spark.apache.org;

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-08 Thread Dan Dong

Hi,Alonso. Thanks! I've read about this but did not quite understand it. To pick out the topic name of a kafka message seems a simple task but the example code looks so complicated with redundent info. Why do we need offsetRanges here and do we have a easy way to achieve this? Cheers, Dan 2017

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

2017-09-08 Thread Paul

You would set the Kafka topic as your data source and you would write a custom output to Cassandra everything would be or could be contained within your stream -Paul Sent from my iPhone > On Sep 8, 2017, at 2:52 PM, kant kodali wrote: > > How can I use one SparkSession to talk to both Kafka

Re: SPARK CSV ISSUE

2017-09-08 Thread Gourav Sengupta

Hi, Naga has kindly suggested here that I should push the file into RDD and get rid of header. But my partitions have hundreds of files in it and just opening and processing the files using RDD is a way old method of working. I think that SPARK community has moved on from RDD, to Dataframes to Dat

Part-time job

[no subject]

Wish you give our product a wonderful name

Re: graphframe out of memory

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

Re: Chaining Spark Streaming Jobs

CVE-2017-12612 Unsafe deserialization in Apache Spark launcher API

[Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

SPARK CSV ISSUE

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

Re: CSV write to S3 failing silently with partial completion

Multiple vcores per container when running Spark applications in Yarn cluster mode

Re: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

Re: Multiple Kafka topics processing in Spark 2.2

Re: is it ok to have multiple sparksession's in one spark structured streaming app?

Re: SPARK CSV ISSUE

18 matches

Site Navigation

Mail list logo

Footer information