Re: Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-20 Thread Gowtham S
em, please feel free to share them. Looking forward to hearing from others who might have encountered similar issues. Thanks and regards, Gowtham S On Tue, 19 Sept 2023 at 17:23, Karthick wrote: > Subject: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem > > Dear Spark C

unsubscribe

2023-05-09 Thread Balakumar iyer S

Got Error Creating permanent view in Postgresql through Pyspark code

2023-01-04 Thread Vajiha Begum S A
I have tried to Create a permanent view in Postgresql DB through Pyspark code, but I have received the below error message. Kindly help me to create a permanent view table in the database.How shall create permanent view using Pyspark code. Please do reply. *Error Message::* *Exception has occurred

Error using SPARK with Rapid GPU

2022-11-30 Thread Vajiha Begum S A
Hi, I'm using an Ubuntu system with the NVIDIA Quadro K1200 with GPU memory 20GB Installed - CUDF 22.10.0 jar file, Rapid 4 Spark 2.12-22.10.0 jar file, CUDA Toolkit 11.8.0 Linux version., JAVA 8 I'm running only single server, Master is localhost I'm trying to run pyspark code through spark submi

Error - using Spark with GPU

2022-11-30 Thread Vajiha Begum S A
spark-submit /home/mwadmin/Documents/test.py 22/11/30 14:59:32 WARN Utils: Your hostname, mwadmin-HP-Z440-Workstation resolves to a loopback address: 127.0.1.1; using ***.***.**.** instead (on interface eno1) 22/11/30 14:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address U

Unable to use GPU with pyspark in windows

2022-11-23 Thread Vajiha Begum S A
Hi Sean Owen, I'm using windows system with the NVIDIA Quadro K1200. GPU memory 20GB, Intel Core: 8 core installed - CUDAF 0.14 jar file, Rapid 4 Spark 2.12-22.10.0 jar file, CUDA Toolkit 11.8.0 windows version. Also installed- WSL 2.0 ( since I'm using windows system) I'm running only single ser

Pyspark ML model Save Error

2022-11-16 Thread Vajiha Begum S A
Hi, This is Vajiha, Senior Research Analyst. I'm working for Predictive Analysis with Pyspark ML models. It's quite good working with the features of spark in python. Though I'm having issues saving the pyspark trained ML models. I have read many articles,stack overflow and spark forum comments and

Creating Custom Broadcast Join

2022-09-01 Thread Murali S
Hi, I wanted to broadcast a Dataframe to all executors and do an operation similar to join, but might return a variable number of rows than the rows in each partition and could use multiple rows to produce one row. I am trying to create a custom join operator for this use case. It would be great i

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
also continue processing? > > On Wed, Feb 16, 2022 at 7:58 AM S wrote: > >> Retries have been already implemented. The question is how to stop the >> spark job by having an executor JVM send a signal to the driver JVM. e.g. I >> have a microbatch of 30 messages; 10 in eac

Re: Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
7;s what you want? you actually want > to retry the failed attempts, not just avoid calling the microservice. > > On Wed, Feb 16, 2022 at 3:18 AM S wrote: > >> Hi, >> >> We have a spark job that calls a microservice in the lambda function of >> the flatmap transfo

Implementing circuit breaker pattern in Spark

2022-02-16 Thread S
Hi, We have a spark job that calls a microservice in the lambda function of the flatmap transformation -> passes to this microservice, the inbound element in the lambda function and returns the transformed value or "None" from the microservice as an output of this flatMap transform. Of course the

Spark Structured Streaming Continuous Trigger on multiple sinks

2021-08-25 Thread S
Hello, I have a structured streaming job that needs to be able to write to multiple sinks. We are using *Continuous* Trigger *and not* *Microbatch* Trigger. 1. When we use the foreach method using: *dataset1.writeStream.foreach(kafka ForEachWriter logic).trigger(ContinuousMode).start().awaitTermi

Rotating expired kubernetes oauth token (SPARK-27997)

2021-08-06 Thread Alfred S.
Hello team, I’ve been wondering how do kubernetes’ users solve the issue with the expiring oauth token[0]. I prepared a patch which allows a user to provide a class that implements OAuthTokenProvider via configuration. Should I send a PR into the master branch? [0] https://issues.apache.org/ji

Re: Spark Structured Streaming

2021-05-31 Thread S
ion of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 31 May 2021 a

Spark Structured Streaming

2021-05-31 Thread S
Hi, I am using Structured Streaming on Azure HdInsight. The version is 2.4.6. I am trying to understand the microbatch mode - default and fixed intervals. Does the fixed interval microbatch follow something similar to receiver based model where records keep getting pulled and stored into blocks f

Unsubscribe

2020-12-10 Thread Przemysław S . Gliniecki
unsubscribe

how can i write spark addListener metric to kafka

2020-06-09 Thread a s
hi Guys, I am building a structured streaming app for google analytics data i want to capture the number of rows read and processed i am able to see it in log how can i send it to kafka Thanks Alis

Re: Spark Security

2020-06-01 Thread Wilbert S.
Hello, My hard drive has about 80 GB of space left on it, and the RAM is about 12GB. I am not sure the size of the .tsv file, but it will most likely be around 30 GB. Thanks, Wilbert Seoane On Fri, May 29, 2020 at 5:03 PM Anwar AliKhan wrote: > What is the size of your .tsv file sir ?

Re: How to import PySpark into Jupyter

2020-04-10 Thread Akchhaya S
Hello Yasir, You need to check your 'PYTHONPATH' environment variable. For windows, If I do a "pip install", the package is installed in "lib\site-packages" under the python folder. If I "print (sys.path)", I see "lib\site-packages" as one of the entries, and I can expect "import " to work. Find

unsubscribe

2020-01-17 Thread Bruno S. de Barros
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2020-01-06 Thread Bruno S. de Barros
  unsubscribe   - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2020-01-05 Thread Bruno S. de Barros
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Understanding life cycle of RpcEndpoint: CoarseGrainedExecutorBackend

2019-12-18 Thread S
ages/requests? Is it referring to the "set of tasks on assigned to this particular RPCEndpoint" from a stage of a spark RDD on its individual partitions?* *Q3: If the receive method is indeed called multiple times through the course of a spark job where each request refers to the set of t

Re: Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-23 Thread Balakumar iyer S
hat an exception happened while writing out > the orc file, not what that underlying exception is, there should be at > least one more caused by under the one you included. > > Thanks, > > Bobby > > On Mon, Jul 22, 2019 at 5:58 AM Balakumar iyer S > wrote: > >> Hi

Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-22 Thread Balakumar iyer S
Hi , I am trying to perform a group by followed by aggregate collect set operation on a two column data-setschema (LeftData int , RightData int). code snippet val wind_2 = dframe.groupBy("LeftData").agg(collect_set(array("RightData"))) wind_2.write.mode(SaveMode.Append).format("orc

spark error when initializing spark session in java

2019-05-10 Thread Serena S Yuan
Hi, When I run the following code within a bigger function there is an error SparkConf sparkConf = new SparkConf().setAppName("ContactListenerExample").setMaster("local[2]").set("spark.executor.memory","1g"); SparkContext sc = new SparkContext(sparkConf); Here is the error: java.lang.NumberFo

The following Java MR code works for small dataset but throws(arrayindexoutofBound) error for large dataset

2019-05-09 Thread Balakumar iyer S
Hi All, I am trying to read a orc file and perform groupBy operation on it , but When i run it on a large data set we are facing the following error message. Input format of INPUT DATA |178111256| 107125374| |178111256| 107148618| |178111256| 107175361| |178111256| 107189910| and we are tr

error when running decisiontree in java

2019-05-03 Thread Serena S Yuan
Hi, I integrated the apache spark decision tree classifier in a java program that reads real time data into an array called 'vals' and then run the code: Vector v = Vectors.dense(vals); LabeledPoint pos = new LabeledPoint(0.0, v); SparkConf sparkConf = new SparkConf().setAppName("ContactListenerE

An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Balakumar iyer S
Hi , While running the following spark code in the cluster with following configuration it is spread into 3 job Id's CLUSTER CONFIGURATION 3 NODE CLUSTER NODE 1 - 64GB 16CORES NODE 2 - 64GB 16CORES NODE 3 - 64GB 16CORES At Job Id 2 job is stuck at the stage 51 of 254 and then it starts ut

Qn about decision tree apache spark java

2019-04-04 Thread Serena S Yuan
Hi, I am trying to use apache spark's decision tree classifier. I am trying to implement the method found in https://spark.apache.org/docs/1.5.1/ml-decision-tree.html 's classification example. I found the dataset at https://github.com/apache/spark/blob/master/data/mllib/sample_libsv

BLAS library class def not found error

2019-03-28 Thread Serena S Yuan
Hi, I was using the apache spark machine learning library in java (posted this issue at https://stackoverflow.com/questions/55367722/apache-spark-in-java-machine-learning-com-github-fommil-netlib-f2jblas-dscalf?noredirect=1#comment97464462_55367722 ), and I had an error while trying to train the

Fwd: BeakerX 1.0 released

2018-07-05 Thread s...@draves.org
We are pleased to announce the release of BeakerX 1.0 . BeakerX is a collection of kernels and extensions to the Jupyter interactive computing environment. It provides JVM support, Spark cluster support, polyglot programming, interactive plots, tables, forms, publishing, and m

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread s...@draves.org
gt; operations .. Then : > > >- what is the relationship does the %%spark magic and the toree >kernel? >- how does the %%spark magic get applied to that other Cell 3 ? > > thanks! > > 2018-06-07 16:33 GMT-07:00 s...@draves.org : > >> We are pleased to anno

[announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread s...@draves.org
We are pleased to announce release 0.19.0 of BeakerX , a collection of extensions and kernels for Jupyter and Jupyter Lab. BeakerX now features Scala+Spark integration including GUI configuration, status, progress, interrupt, and interactive tables. We are very interested in y

proxy on spark UI

2017-06-27 Thread Soheila S.
Hi all, I am using Hadoop 2.6.5 and spark 2.1.0 and run a job using spark-submit and master is set to "yarn". When spark starts, I can load Spark UI page using port 4040 but no job is shown in the page. After the following logs (registering application master on yarn) spark UI is not accessible any

Re: running spark program on intellij connecting to remote master for cluster

2017-05-10 Thread s t
spark job from my local workstation to a remote cluster using the SparkLauncher class, but I didn't actually have SPARK_HOME set or the spark-submit script on my local machine yet, so the submit was failing. I think the error I was getting was that SPARK_HOME environment variable was not set, tho

running spark program on intellij connecting to remote master for cluster

2017-05-10 Thread s t
Hello, I am trying to run spark code from my laptop with intellij. I have cluster of 2 nodes and a master. When i start the program from intellij it gets error of some missing classes. I am aware that some jars need to be distributed to the workers but do not know if it is possible programati

heap overflow within seconds : pyspark kinesis stream with Spark 2.1.0

2017-04-22 Thread s t
Hi, I hope i am missing very simple point to stuck this kind of error. http://stackoverflow.com/questions/43560807/pyspark-streaming-from-kinesis-kills-heap Regards, Serkan

Parameter in FlatMap function

2017-04-14 Thread Soheila S.
Hello all, Can someone help me to solve the following fundamental problem? I have a JavaRDD and as a flatMap method, I call a new instance of a class which implements FlatMapFunction. This class has a constructor method and a call method. In constructor method, I set the values for "List" variabl

Text

2017-01-27 Thread Soheila S.
Hi All, I read a test file using sparkContext.textfile(filename) and assign it to an RDD and process the RDD (replace some words) and finally write it to a text file using rdd.saveAsTextFile(output). Is there any way to be sure the order of the sentences will not be changed? I need to have the same

How to tune number of tesks

2017-01-26 Thread Soheila S.
Hi all, Please tell me how can I tune output partition numbers. I run my spark job on my local machine with 8 cores and input data is 6.5GB. It creates 193 tasks and put the output into 193 partitions. How can I change the number of tasks and consequently, the number of output files? Best, Soheil

failed to launch org.apache.spark.deploy.master.Master

2017-01-12 Thread Soheila S.
Hi, I have executed my spark job using spark-submit on my local machine and on cluster. Now I want to try using HDFS. I mean put the data (text file) on hdfs and read from there, execute the jar file and finally write the output to hdfs. I got this error after running the job: *failed to launch or

Aw: Re: Re: Spark Streaming prediction

2017-01-02 Thread Daniela S
(e.g. 180 minutes) I would of course like to use these values and the missing ones to get values for the next 24 hours (one value per minute) should be predicted.   Thank you in advance.   Regards, Daniela   Gesendet: Montag, 02. Januar 2017 um 22:30 Uhr Von: "Marco Mistroni" An: 

Aw: Re: Spark Streaming prediction

2017-01-02 Thread Daniela S
, 02. Januar 2017 um 21:07 Uhr Von: "Marco Mistroni" An: "Daniela S" Cc: User Betreff: Re: Spark Streaming prediction Hi  you  might want to have a look at the Regression ML  algorithm and integrate it in your SparkStreaming application, i m sure someone on the list has

Spark Streaming prediction

2017-01-02 Thread Daniela S
Hi   I am trying to solve the following problem with Spark Streaming. I receive timestamped events from Kafka. Each event refers to a device and contains values for every minute of the next 2 to 3 hours. What I would like to do is to predict the minute values for the next 24 hours. So I would li

Spark subscribe

2016-12-22 Thread pradeep s
Hi , Can you please add me to spark subscription list. Regards Pradeep S

Re: Spark job server pros and cons

2016-12-09 Thread Shak S
Spark job Server(SJS) gives you the ability to have your spark job as a service. It has features like caching RDD, publish rest APIs to submit your job and named RDDs. For more info, refer https://github.com/spark-jobserver/spark-jobserver. Internally SJS too uses the same spark job submit so it u

KMediods in Spark java

2016-12-08 Thread Shak S
Is there any example to implement KMediods cluster in spark and java? I searched Spark API looks like Spark has not yet implemented KMediods. Any example or inputs will be appreciated. Thanks.

filter RDD by variable

2016-12-07 Thread Soheila S.
Hi I am new in Spark and have a question in first steps of Spark learning. How can I filter an RDD using an String variable (for example words[i]) , instead of a fix one like "Error"? Thanks a lot in advance. Soheila

Spark Streaming - join streaming and static data

2016-12-06 Thread Daniela S
Hi   I have some questions regarding Spark Streaming.   I receive a stream of JSON messages from Kafka. The messages consist of a timestamp and an ID.   timestamp                 ID 2016-12-06 13:00    1 2016-12-06 13:40    5 ...   In a database I have values for each ID:   ID       m

Unsubscribe

2016-12-03 Thread S Malligarjunan
Unsubscribe Thanks and Regards,Malligarjunan S.  

Re: Unsubscribe

2016-12-03 Thread S Malligarjunan
Unsubscribe Thanks and Regards,Malligarjunan S.   On Saturday, 3 December 2016, 20:42, Sivakumar S wrote: Unsubscribe

Unsubscribe

2016-12-03 Thread Sivakumar S
Unsubscribe

Unsubscribe

2016-11-30 Thread Sivakumar S

Re: Happy Diwali to those forum members who celebrate this great festival

2016-10-30 Thread Sivakumaran S
Thank you Dr Mich :) Regards Sivakumaran S > On 30-Oct-2016, at 4:07 PM, Mich Talebzadeh wrote: > > Enjoy the festive season. > > Regards, > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6z

Re: Restful WS for Spark

2016-09-30 Thread gobi s
Hi All, sample spark project which uses REST. http://techgobi.blogspot.in/2016/09/bigdata-sample-project.html On Fri, Sep 30, 2016 at 11:39 PM, Vadim Semenov wrote: > There're two REST job servers that work with spark: > > https://github.com/spark-jobserver/spark-jobserver > > https://github.c

Re: Question about executor memory setting

2016-09-29 Thread mohan s
Hi Kindly go through the below link. It explains good way about spark memory allocations. https://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications?from_m_app=ios Regards Mohan.s > On 28-Sep-2016, at 7:57 AM, Dogtail L wrote: > > Hi all, > > May I a

Spark word count program , need help on integration

2016-09-12 Thread gobi s
Hi, I am new to spark. I want to develop a word count app and deploy it in local mode. from outside I want to trigger the program and get the word count output and show it to the UI. I need help on integration of Spark and outside. i) How to trigger the Spark app from the j2ee app

Re: Scala Vs Python

2016-09-02 Thread Sivakumaran S
Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This of course depends on the complexity of the DAG. I guess it is a matter of language preference. Regards, Sivakumaran S > On

Re: How to convert List into json object / json Array

2016-08-30 Thread Sivakumaran S
Look at scala.util.parsing.json or the Jackson library for json manipulation. Also read http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets <http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets> Regards, Sivakumaran S

Re: Design patterns involving Spark

2016-08-28 Thread Sivakumaran S
Spark best fits for processing. But depending on the use case, you could expand the scope of Spark to moving data using the native connectors. The only that Spark is not, is Storage. Connectors are available for most storage options though. Regards, Sivakumaran S > On 28-Aug-2016, at 6

Re: Dynamically change executors settings

2016-08-26 Thread linguin . m . s
Hi, No, currently you can't change the setting. // maropu 2016/08/27 11:40、Vadim Semenov のメッセージ: > Hi spark users, > > I wonder if it's possible to change executors settings on-the-fly. > I have the following use-case: I have a lot of non-splittable skewed files in > a custom format that

Re: quick question

2016-08-25 Thread Sivakumaran S
requirements may vary. This may help too (http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/HomeWebsocket/WebsocketHome.html#section7 <http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/HomeWebsocket/WebsocketHome.html#section7>) Regards, Sivakumaran S >

Re: quick question

2016-08-25 Thread Sivakumaran S
driver code in Python? The link Kevin has sent should start you off. Regards, Sivakumaran > On 25-Aug-2016, at 11:53 AM, kant kodali wrote: > > yes for now it will be Spark Streaming Job but later it may change. > > > > > > On Thu, Aug 25, 2016 2:37 AM, Sivaku

Re: quick question

2016-08-25 Thread Sivakumaran S
Is this a Spark Streaming job? Regards, Sivakumaran S > @Sivakumaran when you say create a web socket object in your spark code I > assume you meant a spark "task" opening websocket > connection from one of the worker machines to some node.js server in that > case th

Re: quick question

2016-08-24 Thread Sivakumaran S
help? Sivakumaran S > On 25-Aug-2016, at 6:30 AM, kant kodali wrote: > > so I would need to open a websocket connection from spark worker machine to > where? > > > > > > On Wed, Aug 24, 2016 8:51 PM, Kevin Mellott kevin.r.mell...@gmail.com > <mailto:ke

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Sivakumaran S
; > Does topic has partitions? which version of Spark you are using? > > On Wed, Aug 10, 2016 at 2:38 AM, Sivakumaran S <mailto:siva.kuma...@me.com>> wrote: > Hi, > > Here is a working example I did. > > HTH > > Regards, > > Sivakumaran S >

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Sivakumaran S
Hi, Here is a working example I did. HTH Regards, Sivakumaran S val topics = "test" val brokers = "localhost:9092" val topicsSet = topics.split(",").toSet val sparkConf = new SparkConf().setAppName("KafkaWeatherCalc").setMaster("local")

Re: Have I done everything correctly when subscribing to Spark User List

2016-08-08 Thread Sivakumaran S
Does it have anything to do with the fact that the mail address is displayed as user @spark.apache.org ? There is a space before ‘@‘. This is as received in my mail client. Sivakumaran > On 08-Aug-2016, at 7:42 PM, Chris Mattmann wrote: > > Weird! > > > > > > O

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Sivakumaran S
Not an expert here, but the first step would be devote some time and identify which of these 112 factors are actually causative. Some domain knowledge of the data may be required. Then, you can start of with PCA. HTH, Regards, Sivakumaran S > On 08-Aug-2016, at 3:01 PM, Tony Lane wr

Re: Help testing the Spark Extensions for the Apache Bahir 2.0.0 release

2016-08-07 Thread Sivakumaran S
Hi, How can I help? regards, Sivakumaran S > On 06-Aug-2016, at 6:18 PM, Luciano Resende wrote: > > Apache Bahir is voting it's 2.0.0 release based on Apache Spark 2.0.0. > > https://www.mail-archive.com/dev@bahir.apache.org/msg00312.html > <https:

Re: Visualization of data analysed using spark

2016-07-31 Thread Sivakumaran S
Hi Tony, If your requirement is browser based plotting (real time or other wise), you can load the data and display it in a browser using D3. Since D3 has very low level plotting routines, you can look at C3 ( provided by www.pubnub.com) or Rickshaw (https://github.com/shutterstock/rickshaw

Re: Potential Change in Kafka's Partition Assignment Semantics when Subscription Changes

2016-07-25 Thread Vahid S Hashemian
Sorry, meant to ask if any Apache Spark user would be affected. --Vahid From: Vahid S Hashemian/Silicon Valley/IBM@IBMUS To: user@spark.apache.org, d...@spark.apache.org Date: 07/25/2016 05:21 PM Subject:Potential Change in Kafka's Partition Assignment Semantics

Potential Change in Kafka's Partition Assignment Semantics when Subscription Changes

2016-07-25 Thread Vahid S Hashemian
Hello, We have started a KIP under the Kafka project that proposes a fix for an inconsistency in how partition assignments are currently handled in Kafka when the consumer changes subscription. Note that this applies to new consumer only. The KIP can be found here: https://cwiki.apache.org/con

Re: Is spark-submit a single point of failure?

2016-07-22 Thread Sivakumaran S
Thanks Cody :) Regards, Sivakumaran > On 22-Jul-2016, at 2:57 PM, Cody Koeninger wrote: > > http://spark.apache.org/docs/latest/submitting-applications.html > > look at cluster mode, supervise > > On Fri, Jul 22, 2016 at 8:46 AM, Sivakumaran S wrote: >> H

Is spark-submit a single point of failure?

2016-07-22 Thread Sivakumaran S
fails and has to be restarted. Is there any way to obviate this? Is my understanding correct that the spark-submit in its current form is a Single Point of Vulnerability, much akin to the NameNode in HDFS? regards Sivakumaran S

Re: Send real-time alert using Spark

2016-07-12 Thread Sivakumaran S
What language are you coding in? Use a mail client library to send out a custom mail to the required recipient. If you want to send an alert to a mobile, you may have to install a GSM card in the machine and then use it to send an SMS. HTH, Regards, Sivakumaran > On 12-Jul-2016, at 3:35 PM, P

Re: Question on Spark shell

2016-07-11 Thread Sivakumaran S
put starting the application on the console. You > are not seeing any output? > > On Mon, 11 Jul 2016 at 11:55 Sivakumaran S <mailto:siva.kuma...@me.com>> wrote: > I am running a spark streaming application using Scala in the IntelliJ IDE. I > can see the Spark output in t

Re: Question on Spark shell

2016-07-11 Thread Sivakumaran S
play with straight > away. The output is printed to the console. > > On Mon, 11 Jul 2016 at 11:47 Sivakumaran S <mailto:siva.kuma...@me.com>> wrote: > Hello, > > Is there a way to start the spark server with the log output piped to screen? > I am currently running s

Question on Spark shell

2016-07-11 Thread Sivakumaran S
Hello, Is there a way to start the spark server with the log output piped to screen? I am currently running spark in the standalone mode on a single machine. Regards, Sivakumaran - To unsubscribe e-mail: user-unsubscr...@spa

Re: problem extracting map from json

2016-07-07 Thread Sivakumaran S
Hi Michal, Will an example help? import scala.util.parsing.json._//Requires scala-parsec-combinators because it is no longer part of core scala val wbJSON = JSON.parseFull(weatherBox) //wbJSON is a JSON object now //Depending on the structure, now traverse through the object val listW

Re: Multiple aggregations over streaming dataframes

2016-07-07 Thread Sivakumaran S
gt; probably rewrite the query in such a way that it does aggregation in one pass > but that would obfuscate the purpose of the various stages. > > Le 7 juil. 2016 12:55, "Sivakumaran S" <mailto:siva.kuma...@me.com>> a écrit : > Hi Arnauld, > > Sorry for the

Re: Multiple aggregations over streaming dataframes

2016-07-07 Thread Sivakumaran S
Hi Arnauld, Sorry for the doubt, but what exactly is multiple aggregation? What is the use case? Regards, Sivakumaran > On 07-Jul-2016, at 11:18 AM, Arnaud Bailly wrote: > > Hello, > > I understand multiple aggregations over streaming dataframes is not currently > supported in Spark 2.0.

Re: Python to Scala

2016-06-18 Thread Sivakumaran S
If you can identify a suitable java example in the spark directory, you can use that as a template and convert it to scala code using http://javatoscala.com/ Siva > On 18-Jun-2016, at 6:27 AM, Aakash Basu wrote: > > I don't have a sound knowledge in Python and on the

Re: choice of RDD function

2016-06-16 Thread Sivakumaran S
ection":8.50031} In my Spark app, I have set the batch duration as 60 seconds. Now, as per the 1.6.1 documentation, "Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using SQLContext.read.json() on either an RDD of

Re: choice of RDD function

2016-06-16 Thread Sivakumaran S
s://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Wed, Jun 15, 2016 at 11:55 PM, Sivakumaran S wrote: >> Cody, >> >> Are you referring to the val lines = messages.map(_

Re: choice of RDD function

2016-06-15 Thread Sivakumaran S
Cody, Are you referring to the val lines = messages.map(_._2)? Regards, Siva > On 15-Jun-2016, at 10:32 PM, Cody Koeninger wrote: > > Doesn't that result in consuming each RDD twice, in order to infer the > json schema? > > On Wed, Jun 15, 2016 at 11:19 AM, Siva

ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-15 Thread S Sarkar
4.0" ) resolvers += "Akka Repository" at "http://repo.akka.io/releases/"; I am getting TaskResultGetter error with ClassNotFoundException for scala.Some . Can I please get some help how to fix it? Thanks, S. Sarkar -- View this message in context: http://apache-spar

Re: choice of RDD function

2016-06-15 Thread Sivakumaran S
> Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Wed, Jun 15, 2016 at 5:03 PM, Sivakumaran S wrote: >> Thanks Jacek, >> >> Job completed!! :) Just used data frames and sql query. Very clean and >> functional code. >> >> Siva >> >> On 15-Jun-2016, at 3:10 PM, Jacek Laskowski wrote: >> >> mapWithState >> >>

Re: choice of RDD function

2016-06-15 Thread Sivakumaran S
Thanks Jacek, Job completed!! :) Just used data frames and sql query. Very clean and functional code. Siva > On 15-Jun-2016, at 3:10 PM, Jacek Laskowski wrote: > > mapWithState

choice of RDD function

2016-06-14 Thread Sivakumaran S
e may be more fields added to the json at a later stage. There will be a lot of “id”s at a later stage. Q2. If it can be done using either, which one would render to be more efficient and fast? As of now, the entire set up is in a single laptop. Thanks in advance. Regards, Siva

No of Spark context per jvm

2016-05-09 Thread praveen S
Hi, As far as I know you can create one SparkContext per jvm, but wanted to confirm if it's one per jvm or one per classloader. As in one SparkContext created per *. war, all deployment under one tomcat instance Regards, Praveen

how to orderBy previous groupBy.count.orderBy

2016-04-29 Thread Brent S. Elmer Ph.D.
I have the following simple example that I can't get to work correctly. In [1]: from pyspark.sql import SQLContext, Row from pyspark.sql.types import StructType, StructField, IntegerType, StringType from pyspark.sql.functions import asc, desc, sum, count sqlContext = SQLContext(sc) error_schema

Re: Java exception when showing join

2016-04-25 Thread Brent S. Elmer Ph.D.
t; /usr/local/src/spark/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9- > src.zip/py4j/java_gateway.py in __call__(self, *args) >     811 answer = self.gateway_client.send_command(command) >     812 return_value = get_return_value( > --> 813 ans

Re: spark-ec2 hitting yum install issues

2016-04-18 Thread Anusha S
Yes, it does not work manually. I am not able to really do 'yum search' to find exact package names to try others, but I tried python-pip and it gave same error. I will post this in the link you pointed out. Thanks! On Thu, Apr 14, 2016 at 6:11 PM, Nicholas Chammas < nicholas.cham...@gmail.com> w

Grouping in Spark Streaming / batch size = time window?

2016-04-11 Thread Daniela S
Hi,   I am a newbie in Spark Streaming and have some questions.   1) Is it possible to group a stream in Spark Streaming like in Storm (field grouping)?   2) Could the batch size be used instead of a time window?   Thank you in advance.   Regards, Daniela     -

Use only latest values

2016-04-09 Thread Daniela S
Hi, I would like to cache values and to use only the latest "valid" values to build a sum. In more detail, I receive values from devices periodically. I would like to add up all the valid values each minute. But not every device sends a new value every minute. And as long as there is no new val

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Sorry.. Rephrasing : Can this issue be resolved by having a smaller block interval? Regards, Praveen On 18 Feb 2016 21:30, "praveen S" wrote: > Can having a smaller block interval only resolve this? > > Regards, > Praveen > On 18 Feb 2016 21:13, "Cody Koeninger&qu

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Can having a smaller block interval only resolve this? Regards, Praveen On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > Backpressure won't help you with the first batch, you'd need > spark.streaming.kafka.maxRatePerPartition > for that > > On Thu, Feb 18

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say 5 hour

  1   2   >