date:20190828

please care and vote for Chinese people under cruel autocracy of CCP, great thanks!

2019-08-28 Thread ant_fighter

Hi all, Sorry for disturbing you guys. Though I don't think here as a proper place to do this, I need your help, your vote, your holy vote, for us Chinese, for conscience and justice, for better world. In the over 70 years of ruling over China, the Chinese Communist Party has done many horrible

[python 2.4.3] correlation matrix

2019-08-28 Thread Rishi Shah

Hi All, What is the best way to calculate correlation matrix? -- Regards, Rishi Shah

Re: Structured Streaming Dataframe Size

2019-08-28 Thread Nick Dawes

Thank you, TD. Couple of follow up questions please. 1) "It only keeps around the minimal intermediate state data" How do you define "minimal" here? Is there a configuration property to control the time or size of Streaming Dataframe? 2) I'm not writing anything out to any database or S3. My req

Re: Caching tables in spark

2019-08-28 Thread Tzahi File

I mean two separate spark jobs On Wed, Aug 28, 2019 at 2:25 PM Subash Prabakar wrote: > When you mean by process is it two separate spark jobs? Or two stages > within same spark code? > > Thanks > Subash > > On Wed, 28 Aug 2019 at 19:06, wrote: > >> Take a look at this article >> >> >> >> >>

Re: Caching tables in spark

2019-08-28 Thread Subash Prabakar

When you mean by process is it two separate spark jobs? Or two stages within same spark code? Thanks Subash On Wed, 28 Aug 2019 at 19:06, wrote: > Take a look at this article > > > > > https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html > > > > *From:* Tzahi File >

RE: Caching tables in spark

2019-08-28 Thread email

Take a look at this article https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html From: Tzahi File Sent: Wednesday, August 28, 2019 5:18 AM To: user Subject: Caching tables in spark Hi, Looking for your knowledge with some question. I have 2 differe

Caching tables in spark

2019-08-28 Thread Tzahi File

Hi, Looking for your knowledge with some question. I have 2 different processes that read from the same raw data table (around 1.5 TB). Is there a way to read this data once and cache it somehow and to use this data in both processes? Thanks -- Tzahi File Data Engineer [image: ironSource]

What is directory "/path/_spark_metadata" for?

2019-08-28 Thread Mark Zhao

Hey, When running Spark on Alluxio-1.8.2, I encounter the following exception: “alluxio.exception.FileDoseNotExistException: Path “/test-data/_spark_metadata” does not exist” in Alluxio master.log. What exactly is the directory "_spark_metadata" used for? And how can I fix this problem? Thanks.

Low cache hit ratio when running Spark on Alluxio

2019-08-28 Thread Jerry Yan

Hi, We are running Spark jobs on an Alluxio Cluster which is serving 13 gigabytes of data with 99% of the data is in memory. I was hoping to speed up the Spark jobs by reading the in-memory data in Alluxio, but found Alluxio local hit rate is only 1.68%, while Alluxio remote hit rate is 98.32%. B

How to improve loading data into Cassandra table in this scenario?

2019-08-28 Thread Shyam P

> > updated the issue content. > https://stackoverflow.com/questions/57684972/how-to-improve-performance-my-spark-job-here-to-load-data-into-cassandra-table Thank you.

please care and vote for Chinese people under cruel autocracy of CCP, great thanks!

[python 2.4.3] correlation matrix

Re: Structured Streaming Dataframe Size

Re: Caching tables in spark

Re: Caching tables in spark

RE: Caching tables in spark

Caching tables in spark

What is directory "/path/_spark_metadata" for?

Low cache hit ratio when running Spark on Alluxio

How to improve loading data into Cassandra table in this scenario?

10 matches

Site Navigation

Mail list logo

Footer information