Spark program not receiving messages from Cloud Pubsub

2022-08-06 Thread Pramod Biligiri
integration? I came across a library called Apache Bahir, but is it a must to use a library like that? The code for my example can be found here: https://github.com/pramodbiligiri/pubsub-spark Pramod Biligiri

Re: Spark program not receiving messages from Cloud Pubsub

2022-08-09 Thread Pramod Biligiri
I was able to get it working. It needed a SparkSession to be instantiated and wait for termination signal from the user. In my case I used a StreamingContext - https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/streaming/StreamingContext.html Pramod Biligiri On Sun, Aug 7, 2022 at 9

SparkSQL query plan to Stage wise breakdown

2015-05-22 Thread Pramod Biligiri
Hi, Is there an easy way to see how a SparkSQL query plan maps to different stages of the generated Spark job? The WebUI is entirely in terms of RDD stages and I'm having a hard time mapping it back to my query. Pramod

Spark Streaming job having issue with Java Flight Recorder (JFR)

2020-02-20 Thread Pramod Biligiri
Hi, Has anyone successfully used Java Flight Recorder (JFR) with Spark Streaming on Oracle Java 8? JFR works for me on batch jobs but not with Streaming. I'm running my streaming job on Amazon EMR. I have enabled Java Flight Recorder (JFR) to profile CPU usage. But at the end of the job, the JFR o

Re: Spark TeraSort source request

2015-04-07 Thread Pramod Biligiri
+1. I would love to have the code for this as well. Pramod On Fri, Apr 3, 2015 at 12:47 PM, Tom wrote: > Hi all, > > As we all know, Spark has set the record for sorting data, as published on: > https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html. > > Here at our group, we would lov

Re: Spark Code to read RCFiles

2015-04-17 Thread Pramod Biligiri
Hi, I remember seeing a similar performance problem with Apache Shark last year when compared to Hive, though that was in a company specific port of the code. Unfortunately I no longer have access to that code. The problem then was reflection based class creation in the critical path of reading eac

Does Spark always wait for stragglers to finish running?

2014-09-15 Thread Pramod Biligiri
Hi, I'm running Spark tasks with speculation enabled. I'm noticing that Spark seems to wait in a given stage for all stragglers to finish, even though the speculated alternative might have finished sooner. Is that correct? Is there a way to indicate to Spark not to wait for stragglers to finish?

Re: Does Spark always wait for stragglers to finish running?

2014-09-15 Thread Pramod Biligiri
as without speculation. Pramod On Mon, Sep 15, 2014 at 4:22 PM, Du Li wrote: > There is a parameter spark.speculation that is turned off by default. > Look at the configuration doc: > http://spark.apache.org/docs/latest/configuration.html > > > > From: Pramod Biligiri > D

Spark Code to read RCFiles

2014-09-23 Thread Pramod Biligiri
Hi, I'm trying to read some data in RCFiles using Spark, but can't seem to find a suitable example anywhere. Currently I've written the following bit of code that lets me count() the no. of records, but when I try to do a collect() or a map(), it fails with a ConcurrentModificationException. I'm ru

Re: Spark Code to read RCFiles

2014-09-24 Thread Pramod Biligiri
naged by Hive (and thus present in a Hive metastore)? In >> that case, Spark SQL ( >> https://spark.apache.org/docs/latest/sql-programming-guide.html) is the >> easiest way. >> >> Matei >> >> On September 23, 2014 at 2:26:10 PM, Pramod Biligiri ( >> pram