date:20170906

Re: Python vs. Scala

2017-09-06 Thread Conconscious

Just run by yourself this test and check the results. During the run also check with top a worker. Python: import random def inside(p): x, y = random.random(), random.random() return x * x + y * y < 1 def estimate_pi(num_samples): count = sc.parallelize(xrange(0, num_samples)).filte

Will an input event older than watermark be dropped?

2017-09-06 Thread 张万新

Hi, I'm investigating the watermark for some time, according to the guide, if we specify a watermark on event time column, the watermark will be used to drop old state data. Then, take window-based count for example, if an event whose time is older than the watermark comes, it will be simply dropp

Multiple Kafka topics processing in Spark 2.2

2017-09-06 Thread Dan Dong

Hi, All, I have one issue here about how to process multiple Kafka topics in a Spark 2.* program. My question is: How to get the topic name from a message received from Kafka? E.g: .. val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, k

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-06 Thread Alonso Isidoro Roman

Hi, reading the official doc , i think you can do it this way: import org.apache.spark.streaming.kafka._ val directKafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, ka

Spark Structured Streaming and compacted topic in Kafka

2017-09-06 Thread Olivier Girardot

Hi everyone, I'm aware of the issue regarding direct stream 0.10 consumer in spark and compacted topics (c.f. https://issues.apache.org/jira/browse/SPARK-17147). Is there any chance that spark structured-streaming kafka is compatible with compacted topics ? Regards, -- *Olivier Girardot*

Non-time-based windows are not supported on streaming DataFrames/Datasets;;

2017-09-06 Thread kant kodali

Hi All, I get the following exception when I run the query below. Not sure what the cause is? "Non-time-based windows are not supported on streaming DataFrames/Datasets;" My TIME_STAMP field is of string type. dataset .sqlContext() .sql("select count(distinct ID), sum(AMOUNT) from (s

spark metrics prefix in Graphite is duplicated

2017-09-06 Thread Mikhailau, Alex

Hi guys, When I set up my EMR cluster with Spark I add "*.sink.graphite.prefix": "$env.$namespace.$team.$app" to metrics.properties The cluster comes up with correct metrics.properties Then I simply add-step to EMR with spark-submit without any metrics namespace parameter. In my Graphite, Spar

[Meetup] Apache Spark and Ignite for IoT scenarious

2017-09-06 Thread Denis Magda

Folks, Those who are craving for mind food this weekend come over the meetup - Santa Clara, Sept 9, 9.30 AM: https://www.meetup.com/datariders/events/242523245/?a=socialmedia — Denis

[ANNOUNCE] Apache Bahir 2.2.0 Released

2017-09-06 Thread Luciano Resende

Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.2.0 which provides the following extensions for Apache Sp

is it ok to have multiple sparksession's in one spark structured streaming app?

2017-09-06 Thread kant kodali

Hi All, I am wondering if it is ok to have multiple sparksession's in one spark structured streaming app? Basically, I want to create 1) Spark session for reading from Kafka and 2) Another Spark session for storing the mutations of a dataframe/dataset to a persistent table as I get the mutations f

sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou

Hi, when I use spark-shell to get the logical plan of sql, an error occurs scala> spark.sessionState :30: error: lazy value sessionState in class SparkSession cannot be accessed in org.apache.spark.sql.SparkSession spark.sessionState ^ But if I use spark-submit to access the

CSV write to S3 failing silently with partial completion

2017-09-06 Thread abbim

Hi all, My team has been experiencing a recurring unpredictable bug where only a partial write to CSV in S3 on one partition of our Dataset is performed. For example, in a Dataset of 10 partitions written to CSV in S3, we might see 9 of the partitions as 2.8 GB in size, but one of them as 1.6 GB. H

Re: sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou

spark-2.1.1 I use 2017-09-07 14:00 GMT+08:00 sujith chacko : > Hi, > may I know which version of spark you are using, in 2.2 I tried with > below query in spark-shell for viewing the logical plan and it's working > fine > > spark.sql("explain extended select * from table1") > > The above qu

Re: sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou

thanks, my mistake 2017-09-07 14:21 GMT+08:00 sujith chacko : > If your intention is to just view the logical plan in spark shell then I > think you can follow the query which I mentioned in previous mail. In > spark 2.1.0 sessionState is a private member which you cannot access. > > Thanks. >

Re: sessionState could not be accessed in spark-shell command line

2017-09-06 Thread ChenJun Zou

I am examined the code and found lazy val is added recently in 2.2.0 2017-09-07 14:34 GMT+08:00 ChenJun Zou : > thanks, > my mistake > > 2017-09-07 14:21 GMT+08:00 sujith chacko : > >> If your intention is to just view the logical plan in spark shell then I >> think you can follow the query whic

Re: Python vs. Scala

Will an input event older than watermark be dropped?

Multiple Kafka topics processing in Spark 2.2

Re: Multiple Kafka topics processing in Spark 2.2

Spark Structured Streaming and compacted topic in Kafka

Non-time-based windows are not supported on streaming DataFrames/Datasets;;

spark metrics prefix in Graphite is duplicated

[Meetup] Apache Spark and Ignite for IoT scenarious

[ANNOUNCE] Apache Bahir 2.2.0 Released

is it ok to have multiple sparksession's in one spark structured streaming app?

sessionState could not be accessed in spark-shell command line

CSV write to S3 failing silently with partial completion

Re: sessionState could not be accessed in spark-shell command line

Re: sessionState could not be accessed in spark-shell command line

Re: sessionState could not be accessed in spark-shell command line

15 matches

Site Navigation

Mail list logo

Footer information