date:20170320

Re: How to use ManualClock with Spark streaming

2017-03-20 Thread ??????????

hi Hemalatha, you can use the time windows, it looks likee df.groupby(windows('timestamp', '20 seconds', '10 seconds')) ---Original--- From: "Saisai Shao" Date: 2017/3/1 09:39:58 To: "Hemalatha A"; Cc: "spark users"; Subject: Re: How to use ManualClock with Spark streaming I don't think usin

Re: How does preprocessing fit into Spark MLlib pipeline

2017-03-20 Thread Yan Facai

SQLTransformer is a good solution if all operators are combined with SQL. By the way, if you like to get hands dirty, writing a Transformer in scala is not hard, and multiple output columns is valid in such case. On Fri, Mar 17, 2017 at 9:10 PM, Yanbo Liang wrote: > Hi Adrian, > > Did you tr

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-20 Thread Yan Facai

Hi, jinhong. Do you use `setRegParam`, which is 0.0 by default ? Both elasticNetParam and regParam are required if regularization is need. val regParamL1 = $(elasticNetParam) * $(regParam) val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam) On Mon, Mar 20, 2017 at 6:31 PM, Yanbo Liang

worker connected to standalone cluster are continuously crashing

2017-03-20 Thread Diego Fanesi

Hello everybody, I configured a simple standalone cluster with few machines and I am trying to submit a very simple job just to test the cluster. my laptop is the client and one of the workers. my server contains the master and the second worker. If I submit my job just executing the scala code

Re: Contributing to Spark

2017-03-20 Thread cht liu

Hi Sam A great way to contribute to Spark is to help answer user questions on the user@spark.apache.org mailing list or on StackOverflow. 2017-03-20 11:50 GMT+08:00 Nick Pentreath : > If you have experience and interest in Python then PySpark is a good area > to look into. > > Yes, adding things

Re: spark streaming exectors memory increasing and executor killed by yarn

2017-03-20 Thread darin

This issue on stackoverflow maybe help https://stackoverflow.com/questions/42641573/why-does-memory-usage-of-spark-worker-increases-with-time/42642233#42642233 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-exectors-memory-increasing-and-ex

Re: Spark 2.0.2 Dataset union() slowness vs RDD union?

2017-03-20 Thread Everett Anderson

Closing the loop on this -- It appears we were just hitting some other problem related to S3A/S3, likely that the temporary directory used by the S3A Hadoop file system implementation for buffering data during upload either was full or had the wrong permissions. On Thu, Mar 16, 2017 at 6:03 PM

Recombining output files in parallel

2017-03-20 Thread Matt Deaver

I have a Spark job that processes incremental data and partitions it by customer id. Some customers have very little data, and I have another job that takes a previous period's data and combines it. However, the job runs serially and I'd basically like to run the function on every partition simulta

Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-20 Thread Cody Koeninger

You want spark.streaming.kafka.maxRatePerPartition for the direct stream. On Sat, Mar 18, 2017 at 3:37 PM, Mal Edwin wrote: > > Hi, > You can enable backpressure to handle this. > > spark.streaming.backpressure.enabled > spark.streaming.receiver.maxRate > > Thanks, > Edwin > > On Mar 18, 2017, 12

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

2017-03-20 Thread Yanbo Liang

Do you want to get sparse model that most of the coefficients are zeros? If yes, using L1 regularization leads to sparsity. But the LogisticRegressionModel coefficients vector's size is still equal with the number of features, you can get the non-zero elements manually. Actually, it would be a spar

Re: Foreachpartition in spark streaming

2017-03-20 Thread Ryan

foreachPartition is an action but run on each worker, which means you won't see anything on driver. mapPartitions is a transformation which is lazy and won't do anything until an action. it depends on the specific use case which is better. To output sth(like a print in single machine) you could r

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-20 Thread Chetan Khatri

Exactly. On Sat, Mar 11, 2017 at 1:35 PM, Dongjin Lee wrote: > Hello Chetan, > > Could you post some code? If I understood correctly, you are trying to > save JSON like: > > { > "first_name": "Dongjin", > "last_name: null > } > > not in omitted form, like: > > { > "first_name": "Dongjin" >

Re: How to use ManualClock with Spark streaming

Re: How does preprocessing fit into Spark MLlib pipeline

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

worker connected to standalone cluster are continuously crashing

Re: Contributing to Spark

Re: spark streaming exectors memory increasing and executor killed by yarn

Re: Spark 2.0.2 Dataset union() slowness vs RDD union?

Recombining output files in parallel

Re: Spark Streaming from Kafka, deal with initial heavy load.

Re: how to retain part of the features in LogisticRegressionModel (spark2.0)

Re: Foreachpartition in spark streaming

Re: Issues: Generate JSON with null values in Spark 2.0.x

12 matches

Site Navigation

Mail list logo

Footer information