date:20180606

Re: Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-06 Thread Kazuaki Ishizaki

Thank you for reporting a problem. Would it be possible to create a JIRA entry with a small program that can reproduce this problem? Best Regards, Kazuaki Ishizaki From: Rico Bergmann To: "user@spark.apache.org" Date: 2018/06/05 19:58 Subject:Strange codegen error for SortMer

Re: Apache Spark Structured Streaming - Kafka Streaming - Option to ignore checkpoint

2018-06-06 Thread amihay gonen

If you are using kafka direct connect api it might be committing offset back to kafka itself בתאריך יום ה׳, 7 ביוני 2018, 4:10, מאת licl ‏: > I met the same issue and I have try to delete the checkpoint dir before the > job , > > But spark seems can read the correct offset even though after the

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

2018-06-06 Thread 李斌松

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

Pyspark Join and then column select is showing unexpected output

2018-06-06 Thread bis_g

I am not sure if the long work is doing this to me but I am seeing some unexpected behavior in spark 2.2.0 I have created a toy example as below toy_df = spark.createDataFrame([ ['p1','a'], ['p1','b'], ['p1','c'], ['p2','a'], ['p2','b'], ['p2','d']],schema=['patient','drug']) I create another da

Re: Apache Spark Structured Streaming - Kafka Streaming - Option to ignore checkpoint

2018-06-06 Thread licl

I met the same issue and I have try to delete the checkpoint dir before the job , But spark seems can read the correct offset even though after the checkpoint dir is deleted , I don't know how spark do this without checkpoint's metadata. -- Sent from: http://apache-spark-user-list.1001560.n3.

Spark ML online serving

2018-06-06 Thread Holden Karau

At Spark Summit some folks were talking about model serving and we wanted to collect requirements from the community. -- Twitter: https://twitter.com/holdenkarau

Re: Hive to Oracle using Spark - Type(Date) conversion issue

2018-06-06 Thread spark receiver

Use unix time and write the unix time to oracle as number column type ,create virtual column in oracle database for the unix time like “oracle_time generated always as (to_date('1970010108','MMDDHH24')+(1/24/60/60)*unixtime ) > On Mar 20, 2018, at 11:08 PM, Gurusamy Thirupathy wrote: > >

Re: Dataframe from 1.5G json (non JSONL)

2018-06-06 Thread raksja

Its happenning in the executor # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 25800"... -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

FINAL REMINDER: Apache EU Roadshow 2018 in Berlin next week!

2018-06-06 Thread sharan

Hello Apache Supporters and Enthusiasts This is a final reminder that our Apache EU Roadshow will be held in Berlin next week on 13th and 14th June 2018. We will have 28 different sessions running over 2 days that cover some great topics. So if you are interested in Microservices, Internet of

Re: [SparkLauncher] stateChanged event not received in standalone cluster mode

2018-06-06 Thread Marcelo Vanzin

That feature has not been implemented yet. https://issues.apache.org/jira/browse/SPARK-11033 On Wed, Jun 6, 2018 at 5:18 AM, Behroz Sikander wrote: > I have a client application which launches multiple jobs in Spark Cluster > using SparkLauncher. I am using Standalone cluster mode. Launching jobs

Re: [SparkLauncher] stateChanged event not received in standalone cluster mode

2018-06-06 Thread bsikander

Any help would be appreciated. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: [External] Re: Sorting in Spark on multiple partitions

2018-06-06 Thread Sing, Jasbir

Hi Jorn, We are using Spark 2.2.0 for our development. Below is the code snippet for your reference: var newDf = data.repartition(col(userid)).sortWithinPartitions(sid,time) newDf.write.format("parquet").saveAsTable("tempData") newDf.coalesce(1).write.format(outputFormat).option("header", "true"

Re: Reg:- Py4JError in Windows 10 with Spark

2018-06-06 Thread Jay

Are you running this in local mode or cluster mode ? If you are running in cluster mode have you ensured that numpy is present on all nodes ? On Tue 5 Jun, 2018, 2:43 AM @Nandan@, wrote: > Hi , > I am getting error :- > > --

Re: Dataframe from 1.5G json (non JSONL)

2018-06-06 Thread Jay

I might have missed it but can you tell if the OOM is happening in driver or executor ? Also it would be good if you can post the actual exception. On Tue 5 Jun, 2018, 1:55 PM Nicolas Paris, wrote: > IMO your json cannot be read in parallell at all then spark only offers > you > to play again w

[SparkLauncher] stateChanged event not received in standalone cluster mode

2018-06-06 Thread Behroz Sikander

I have a client application which launches multiple jobs in Spark Cluster using SparkLauncher. I am using *Standalone* *cluster mode*. Launching jobs works fine till now. I use launcher.startApplication() to launch. But now, I have a requirement to check the states of my Driver process. I added a

[Spark Streaming] Distinct Count on unrelated columns

2018-06-06 Thread Aakash Basu

Hi guys, Posted a question (link) on StackOverflow, any help? Thanks, Aakash.

Re: Strange codegen error for SortMergeJoin in Spark 2.2.1

Re: Apache Spark Structured Streaming - Kafka Streaming - Option to ignore checkpoint

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

Pyspark Join and then column select is showing unexpected output

Re: Apache Spark Structured Streaming - Kafka Streaming - Option to ignore checkpoint

Spark ML online serving

Re: Hive to Oracle using Spark - Type(Date) conversion issue

Re: Dataframe from 1.5G json (non JSONL)

FINAL REMINDER: Apache EU Roadshow 2018 in Berlin next week!

Re: [SparkLauncher] stateChanged event not received in standalone cluster mode

Re: [SparkLauncher] stateChanged event not received in standalone cluster mode

RE: [External] Re: Sorting in Spark on multiple partitions

Re: Reg:- Py4JError in Windows 10 with Spark

Re: Dataframe from 1.5G json (non JSONL)

[SparkLauncher] stateChanged event not received in standalone cluster mode

[Spark Streaming] Distinct Count on unrelated columns

16 matches

Site Navigation

Mail list logo

Footer information