Re: How to run spark connect in kubernetes?

2024-10-02 Thread kant kodali
please ignore this. it was a dns issue On Wed, Oct 2, 2024 at 11:16 AM kant kodali wrote: > Here > <https://stackoverflow.com/questions/79048006/how-to-run-spark-connect-server-in-kubernetes-as-a-spark-driver-in-client-mode> > are more details about my question that I posted i

Re: How to run spark connect in kubernetes?

2024-10-02 Thread kant kodali
Here <https://stackoverflow.com/questions/79048006/how-to-run-spark-connect-server-in-kubernetes-as-a-spark-driver-in-client-mode> are more details about my question that I posted in SO On Tue, Oct 1, 2024 at 11:32 PM kant kodali wrote: > Hi All, > > Is it possible to run a Spark

How to run spark connect in kubernetes?

2024-10-01 Thread kant kodali
Hi All, Is it possible to run a Spark Connect server in Kubernetes while configuring it to communicate with Kubernetes as the cluster manager? If so, is there any example? Thanks

Re: is RosckDB backend available in 3.0 preview?

2020-04-22 Thread kant kodali
this project is > most widely known implementation for RocksDB backend state store - > https://github.com/chermenin/spark-states > > On Wed, Apr 22, 2020 at 11:32 AM kant kodali wrote: > >> Hi All, >> >> 1. is RosckDB backend available in 3.0 preview? >> 2. if

is RosckDB backend available in 3.0 preview?

2020-04-21 Thread kant kodali
Hi All, 1. is RosckDB backend available in 3.0 preview? 2. if RocksDB can store intermediate results of a stream-stream join can I run streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary (imagine

Re: How does spark sql evaluate case statements?

2020-04-17 Thread kant kodali
Thanks! On Thu, Apr 16, 2020 at 9:57 PM ZHANG Wei wrote: > Are you looking for this: > https://spark.apache.org/docs/2.4.0/api/sql/#when ? > > The code generated will look like this in a `do { ... } while (false)` > loop: > > do { > ${cond.code} > if (!${cond.isNull} && ${cond.value})

How does spark sql evaluate case statements?

2020-04-06 Thread kant kodali
Hi All, I have the following query and I was wondering if spark sql evaluates the same condition twice in the case statement below? I did .explain(true) and all I get is a table scan so not sure if spark sql evaluates the same condition twice? if it does, is there a way to return multiple values w

Connected components using GraphFrames is significantly slower than GraphX?

2020-02-16 Thread kant kodali
Hi All, Trying to understand why connected components algorithms runs much slower than the graphX equivalent? Graphx code creates 16 stages. GraphFrame graphFrame = GraphFrame.fromEdges(edges); Dataset connectedComponents = graphFrame.connectedComponents().setAlgorithm("graphx").run(); and the

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-06 Thread kant kodali
Spark doesn't expose that. On Mon, Aug 5, 2019 at 10:54 PM Gourav Sengupta wrote: > Hi, > > exactly my question, I was also looking for ways to gracefully exit spark > structured streaming. > > > Regards, > Gourav > > On Tue, Aug 6, 2019 at 3:43 AM kant kodali wr

How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-05 Thread kant kodali
Hi All, I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions. Thanks!

java_method udf is not visible in the API documentation

2019-06-27 Thread kant kodali
Hi All, I see it here https://spark.apache.org/docs/2.3.1/api/sql/index.html#java_method But I don't see it here https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html

Re: Does Spark SQL has match_recognize?

2019-05-29 Thread kant kodali
Nope Not at all On Sun, May 26, 2019 at 8:15 AM yeikel valdes wrote: > Isn't match_recognize just a filter? > > df.filter(predicate)? > > > On Sat, 25 May 2019 12:55:47 -0700 * kanth...@gmail.com > * wrote > > Hi All, > > Does Spark SQL has match_recognize? I am not sure why CEP s

Does Spark SQL has match_recognize?

2019-05-25 Thread kant kodali
Hi All, Does Spark SQL has match_recognize? I am not sure why CEP seems to be neglected I believe it is one of the most useful concepts in the Financial applications! Is there a plan to support it? Thanks!

How to control batch size while reading from hdfs files?

2019-03-22 Thread kant kodali
Hi All, What determines the batch size while reading from a file from HDFS? I am trying to read files from HDFS and ingest into Kafka using Spark Structured Streaming 2.3.1. I get an error sayiKafkafka batch size is too big and that I need to increase max.request.size. Sure I can increase it but

what is the difference between udf execution and map(someLambda)?

2019-03-17 Thread kant kodali
Hi All, I am wondering what is the difference between UDF execution and map(someLambda)? you can assume someLambda ~= UDF. Any performance difference? Thanks!

Re: Is there a way to validate the syntax of raw spark sql query?

2019-03-05 Thread kant kodali
used. > If there is an issue with your SQL syntax then the method throws below > exception that you can catch > > org.apache.spark.sql.catalyst.parser.ParseException > > > Hope this helps! > > > > Akshay Bhardwaj > +91-97111-33849 > > > On Fri, Mar 1, 2

Is there a way to validate the syntax of raw spark sql query?

2019-03-01 Thread kant kodali
Hi All, Is there a way to validate the syntax of raw spark SQL query? for example, I would like to know if there is any isValid API call spark provides? val query = "select * from table"if(isValid(query)) { sparkSession.sql(query) } else { log.error("Invalid Syntax")} I tried the follow

Re: How to track batch jobs in spark ?

2018-12-06 Thread kant kodali
from output of list command >> yarn application --kill >> >> On Wed, Dec 5, 2018 at 1:42 PM kant kodali wrote: >> >>> Hi All, >>> >>> How to track batch jobs in spark? For example, is there some id or token >>> i can get after I s

How to track batch jobs in spark ?

2018-12-05 Thread kant kodali
Hi All, How to track batch jobs in spark? For example, is there some id or token i can get after I spawn a batch job and use it to track the progress or to kill the batch job itself? For Streaming, we have StreamingQuery.id() Thanks!

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-29 Thread kant kodali
Most people when they compare two different programming languages 99% of the time it all seems to boil down to syntax sugar. Performance I doubt Scala is ever faster than Java given that Scala likes Heap more than Java. I had also written some pointless micro-benchmarking code like (Random String

Does Spark have a plan to move away from sun.misc.Unsafe?

2018-10-24 Thread kant kodali
Hi All, Does Spark have a plan to move away from sun.misc.Unsafe to VarHandles ? I am trying to find a JIRA issue for this? Thanks!

What eactly is Function shipping?

2018-10-16 Thread kant kodali
Hi All, Everyone talks about how easy function shipping is in Scala. I Immediately go "wait a minute" isn't it just Object serialization and deserialization that existing in Java since a long time ago or am I missing something profound here? Thanks!

Does spark.streaming.concurrentJobs still exist?

2018-10-09 Thread kant kodali
Does spark.streaming.concurrentJobs still exist? spark.streaming.concurrentJobs (default: 1) is the number of concurrent jobs, i.e. threads in streaming-job-executor thread pool

Spark on YARN not utilizing all the YARN containers available

2018-10-09 Thread kant kodali
Hi All, I am using Spark 2.3.1 and using YARN as a cluster manager. I currently got 1) 6 YARN containers(executors=6) with 4 executor cores for each container. 2) 6 Kafka partitions from one topic. 3) You can assume every other configuration is set to whatever the default values are. Spawned a

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

2018-10-03 Thread kant kodali
Hi All, How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2? Thanks

can Spark 2.4 work on JDK 11?

2018-09-25 Thread kant kodali
Hi All, can Spark 2.4 work on JDK 11? I feel like there are lot of features that are added in JDK 9, 10, 11 that can make deployment process a whole lot better and of course some more syntax sugar similar to Scala. Thanks!

can I model any arbitrary data structure as an RDD?

2018-09-25 Thread kant kodali
Hi All, I am wondering if I can model any arbitrary data structure as an RDD? For example, can I model, Red-black trees, Suffix Trees, Radix Trees, Splay Trees, Fibonacci heaps, Tries, Linked Lists etc as RDD's? If so, how? To implement a custom RDD I have to implement compute and getPartitions f

Is there any open source framework that converts Cypher to SparkSQL?

2018-09-14 Thread kant kodali
Hi All, Is there any open source framework that converts Cypher to SparkSQL? Thanks!

How do I generate current UTC timestamp in raw spark sql?

2018-08-28 Thread kant kodali
Hi All, How do I generate current UTC timestamp using spark sql? When I do curent_timestamp() it is giving me local time. to_utc_timestamp(current_time(), ) takes timezone in the second parameter and I see no udf that can give me current timezone. when I do spark.conf.set('spark.sql.sessio

Re: How to Create one DB connection per executor and close it after the job is done?

2018-07-30 Thread kant kodali
in Scala, of which only one instance will be > created on each JVM / Executor. E.g. > > object MyDatabseSingleton { > var dbConn = ??? > } > > On Sat, 28 Jul 2018, 08:34 kant kodali, wrote: > >> Hi All, >> >> I understand creating a connection forEac

How to Create one DB connection per executor and close it after the job is done?

2018-07-27 Thread kant kodali
Hi All, I understand creating a connection forEachPartition but I am wondering can I create one DB connection per executor and close it after the job is done? any sample code would help. you can imagine I am running a simple batch processing application. Thanks!

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-07-24 Thread kant kodali
essages to kafka, might be easier on you to > just explode the rows and let Spark do the rest for you. > > > > *From: *kant kodali > *Date: *Tuesday, July 24, 2018 at 1:04 PM > *To: *Silvio Fiorito > *Cc: *Arun Mahadevan , chandan prakash < > chandanbaran...@gmail.c

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-07-24 Thread kant kodali
Batch method that will give you a > DataFrame and let you write to the sink as you wish. > > > > *From: *kant kodali > *Date: *Monday, July 23, 2018 at 4:43 AM > *To: *Arun Mahadevan > *Cc: *chandan prakash , Tathagata Das < > tathagata.das1...@gmail.com>, "ymaha

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-07-23 Thread kant kodali
understand each row has a topic column but can we write one row to multiple topics? On Thu, Jul 12, 2018 at 11:00 AM, Arun Mahadevan wrote: > What I meant was the number of partitions cannot be varied with > ForeachWriter v/s if you were to write to each sink using independent > queries. Maybe t

Re: Do GraphFrames support streaming?

2018-07-15 Thread kant kodali
ackend) then when the next stream arrive join them - create > graph and store the next stream together with the existing stream on disk > etc. > > On 14. Jul 2018, at 17:19, kant kodali wrote: > > The question now would be can it be done in streaming fashion? Are you > talking about

Re: Can I specify watermark using raw sql alone?

2018-07-14 Thread kant kodali
I don't see a withWatermark UDF to use it in Raw sql. I am currently using Spark 2.3.1 On Sat, Jul 14, 2018 at 4:19 PM, kant kodali wrote: > Hi All, > > Can I specify watermark using raw sql alone? other words without using > .withWatermark from > Dataset API. > > Thanks! >

Can I specify watermark using raw sql alone?

2018-07-14 Thread kant kodali
Hi All, Can I specify watermark using raw sql alone? other words without using .withWatermark from Dataset API. Thanks!

Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
for > incremental graph updates. > > On 14. Jul 2018, at 15:59, kant kodali wrote: > > "You want to update incrementally an existing graph and run incrementally > a graph algorithm suitable for this - you have to implement yourself as > far as I am aware" > > I

Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
; fully rerun an algorithms -> look at Janusgraph > You want to update incrementally an existing graph and run incrementally a > graph algorithm suitable for this - you have to implement yourself as far > as I am aware > > > On 29. Apr 2018, at 11:43, kant kodali wrote: > > > > Do GraphFrames support streaming? >

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-09 Thread kant kodali
w why the node automatically is going to unhealthy state and INFO logs don't tell me why. On Sun, Jul 8, 2018 at 7:36 PM, kant kodali wrote: > @yohann Thanks for shining some light! It is making more sense now. > > I think you are correct when you stated: "Your application mas

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
numbers are not part of resource manager logs? On Sun, Jul 8, 2018 at 8:09 AM, kant kodali wrote: > yarn.scheduler.capacity.maximum-am-resource-percent by default is set to > 0.1 and I tried changing it to 1.0 and still no luck. same problem > persists. The master here is yarn and I ju

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
llowed to provide. > You might take a look at https://hadoop.apache.org/ > docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html and > search for maximum-am-resource-percent. > > Regards, > > *Yohann Jardin* > Le 7/8/2018 à 4:40 PM, kant kodali a écrit : > > Hi

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
Hi, It's on local mac book pro machine that has 16GB RAM 512GB disk and 8 vCpu! I am not running any code since I can't even spawn spark-shell with yarn as master as described in my previous email. I just want to run simple word count using yarn as master. Thanks! Below is the resource manager l

spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
Hi All, I am trying to run a simple word count using YARN as a cluster manager. I am currently using Spark 2.3.1 and Apache hadoop 2.7.3. When I spawn spark-shell like below it gets stuck in ACCEPTED stated forever. ./bin/spark-shell --master yarn --deploy-mode client I set my log4j.propertie

Re: Error while doing stream-stream inner join (java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access)

2018-07-03 Thread kant kodali
> > Best Regards, > Ryan > > On Mon, Jul 2, 2018 at 3:56 AM, kant kodali wrote: > >> Hi All, >> >> I get the below error quite often when I do an stream-stream inner join >> on two data frames. After running through several experiments stream-stream >&

Error while doing stream-stream inner join (java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access)

2018-07-02 Thread kant kodali
Hi All, I get the below error quite often when I do an stream-stream inner join on two data frames. After running through several experiments stream-stream joins dont look stable enough for production yet. any advice on this? Thanks! java.util.ConcurrentModificationException: KafkaConsumer is no

Re: Does Spark Structured Streaming have a JDBC sink or Do I need to use ForEachWriter?

2018-06-21 Thread kant kodali
://github.com/GaalDornick/spark/blob/master/ > sql/core/src/main/scala/org/apache/spark/sql/execution/ > streaming/JdbcSink.scala > > https://github.com/GaalDornick/spark/blob/master/ > sql/core/src/main/scala/org/apache/spark/sql/execution/ > streaming/JDBCSinkLog.scala > > &

Does Spark Structured Streaming have a JDBC sink or Do I need to use ForEachWriter?

2018-06-20 Thread kant kodali
Hi All, Does Spark Structured Streaming have a JDBC sink or Do I need to use ForEachWriter? I see the following code in this link and I can see that database name can be passed in the connection string, however, I w

is spark stream-stream joins in update mode targeted for 2.4?

2018-06-18 Thread kant kodali
Hi All, Is spark stream-stream joins in update mode targeted for 2.4? Thanks!

is there a way to parse and modify raw spark sql query?

2018-06-05 Thread kant kodali
Hi All, is there a way to parse and modify raw spark sql query? For example, given the following query spark.sql("select hello from view") I want to modify the query or logical plan such that I can get the result equivalent to the below query. spark.sql("select foo, hello from view") Any samp

is there a way to create a static dataframe inside mapGroups?

2018-06-04 Thread kant kodali
Hi All, Is there a way to create a static dataframe inside mapGroups? given that mapGroups gives Iterator of rows. I just want to take that iterator and populate a static dataframe so I can run raw sql queries on the static dataframe. Thanks!

is it possible to create one KafkaDirectStream (Dstream) per topic?

2018-05-20 Thread kant kodali
Hi All, I have 5 Kafka topics and I am wondering if is even possible to create one KafkaDirectStream (Dstream) per topic within the same JVM i.e using only one sparkcontext? Thanks!

What to consider when implementing a custom streaming sink?

2018-05-15 Thread kant kodali
Hi All, I am trying to implement a custom sink and I have few questions mainly on output modes. 1) How does spark let the sink know that a new row is an update of an existing row? does it look at all the values of all columns of the new row and an existing row for an equality match or does it com

I cannot use spark 2.3.0 and kafka 0.9?

2018-05-04 Thread kant kodali
Hi All, This link seems to suggest I cant use Spark 2.3.0 and Kafka 0.9 broker. is that correct? https://spark.apache.org/docs/latest/streaming-kafka-integration.html Thanks!

Re: question on collect_list or say aggregations in general in structured streaming 2.3.0

2018-05-04 Thread kant kodali
:55 AM, Arun Mahadevan wrote: > I think you need to group by a window (tumbling) and define watermarks > (put a very low watermark or even 0) to discard the state. Here the window > duration becomes your logical batch. > > - Arun > > From: kant kodali > Date: Thursday, May

Re: question on collect_list or say aggregations in general in structured streaming 2.3.0

2018-05-03 Thread kant kodali
spark SQL so not using FlatMapsGroupWithState. And if that is not available then is it fair to say there is no declarative way to do stateless aggregations? On Thu, May 3, 2018 at 1:24 AM, kant kodali wrote: > Hi All, > > I was under an assumption that one needs to run grouby(window(...)

question on collect_list or say aggregations in general in structured streaming 2.3.0

2018-05-03 Thread kant kodali
Hi All, I was under an assumption that one needs to run grouby(window(...)) to run any stateful operations but looks like that is not the case since any aggregation like query "select count(*) from some_view" is also stateful since it stores the result of the count from the previous batch. Likew

what is the query language used for graphX?

2018-05-02 Thread kant kodali
Hi All, what is the query language used for graphX? are there any plans to introduce gremlin or is that idea being dropped and go with Spark SQL? Thanks!

is there a minOffsetsTrigger in spark structured streaming 2.3.0?

2018-04-29 Thread kant kodali
Hi All, just like maxOffsetsTrigger is there a minOffsetsTrigger in spark structured streaming 2.3.0? Thanks!

Re: A naive ML question

2018-04-29 Thread kant kodali
atplotlib > does this). > > On Sat, 28 Apr 2018 at 21:34, kant kodali wrote: > >> Hi, >> >> I mean a transaction goes typically goes through different states like >> STARTED, PENDING, CANCELLED, COMPLETED, SETTLED etc... >> >> Thanks, >> ka

Do GraphFrames support streaming?

2018-04-29 Thread kant kodali
Do GraphFrames support streaming?

Re: A naive ML question

2018-04-28 Thread kant kodali
action at a certain point of time. Do you mean how a financial > product evolved over time given a set of a transactions? > > > On 28. Apr 2018, at 12:46, kant kodali wrote: > > > > Hi All, > > > > I have a bunch of financial transactional data and I was wonderin

A naive ML question

2018-04-28 Thread kant kodali
Hi All, I have a bunch of financial transactional data and I was wondering if there is any ML model that can give me a graph structure for this data? other words, show how a transaction had evolved over time? Any suggestions or references would help. Thanks!

Re: is it ok to make I/O calls in UDF? other words is it a standard practice ?

2018-04-23 Thread kant kodali
n be applied for large datasets but may be for small lookup files. Thanks > > On Mon, Apr 23, 2018 at 4:28 PM kant kodali wrote: > >> Hi All, >> >> Is it ok to make I/O calls in UDF? other words is it a standard practice? >> >> Thanks! >> >

is it ok to make I/O calls in UDF? other words is it a standard practice ?

2018-04-23 Thread kant kodali
Hi All, Is it ok to make I/O calls in UDF? other words is it a standard practice? Thanks!

Re: can we use mapGroupsWithState in raw sql?

2018-04-18 Thread kant kodali
전 9:43, Arun Mahadevan 님이 작성: > >> The below expr might work: >> >> df.groupBy($"id").agg(max(struct($"amount", >> $"my_timestamp")).as("data")).select($"id", $"data.*") >> >> >> Thanks, >>

Re: can we use mapGroupsWithState in raw sql?

2018-04-18 Thread kant kodali
omething like.. > > stream.groupBy($"id").max("amount").writeStream. > outputMode(“complete”/“update")…. > > Unless the “stream” is already a grouped stream, in which case the above > would not work since the support for multiple aggregate operations is not >

Re: can we use mapGroupsWithState in raw sql?

2018-04-17 Thread kant kodali
e SQL language structure. > > On Mon, Apr 16, 2018 at 7:34 PM, kant kodali wrote: > >> Hi All, >> >> can we use mapGroupsWithState in raw SQL? or is it in the roadmap? >> >> Thanks! >> >> >> >

can we use mapGroupsWithState in raw sql?

2018-04-16 Thread kant kodali
Hi All, can we use mapGroupsWithState in raw SQL? or is it in the roadmap? Thanks!

How to select the max row for every group in spark structured streaming 2.3.0 without using order by or mapGroupWithState?

2018-04-15 Thread kant kodali
How to select the max row for every group in spark structured streaming 2.3.0 without using order by or mapGroupWithState? *Input:* id | amount | my_timestamp --- 1 | 5 | 2018-04-01T01:00:00.000Z 1 | 10 | 2018-04-01T01:10:00.000Z 2

Re: Does partition by and order by works only in stateful case?

2018-04-14 Thread kant kodali
got it! Thanks. On Thu, Apr 12, 2018 at 7:53 PM, Tathagata Das wrote: > The traditional SQL windows with `over` is not supported in streaming. > Only time-based windows, that is, `window("timestamp", "10 minutes")` is > supported in streaming. > > On Thu, A

when can we expect multiple aggregations to be supported in spark structured streaming?

2018-04-14 Thread kant kodali
Hi All, when can we expect multiple aggregations to be supported in spark structured streaming? For example, id | amount | my_timestamp -- 1 | 5 | 2018-04-01T01:00:00.000Z 1 | 10 | 2018-04-01T01:10:00.000Z 2 | 20

Does partition by and order by works only in stateful case?

2018-04-12 Thread kant kodali
Hi All, Does partition by and order by works only in stateful case? For example: select row_number() over (partition by id order by timestamp) from table gives me *SEVERE: Exception occured while submitting the query: java.lang.RuntimeException: org.apache.spark.sql.AnalysisException: Non-time

Re: is there a way of register python UDF using java API?

2018-04-02 Thread kant kodali
UserDefinedPythonFunction(java.lang.String name, org.apache.spark.api.python.PythonFunction func, org.apache.spark.sql.types.DataType dataType, int pythonEvalType, boolean udfDeterministic) { /* compiled code */ } On Sun, Apr 1, 2018 at 3:46 PM, kant kodali wrote: > Hi All, > > All of our spark code i

is there a way of register python UDF using java API?

2018-04-01 Thread kant kodali
Hi All, All of our spark code is in Java wondering if there a way to register python UDF's using java API such that the registered UDF's can be used using raw spark SQL. If there is any other way to achieve this goal please suggest! Thanks

Re: Does Spark run on Java 10?

2018-04-01 Thread kant kodali
de note, if some coming version of Scala 2.11 becomes full Java > 9/10 compliant it could work. > > Hope, this helps. > > Thanks, > Muthu > > On Sun, Apr 1, 2018 at 6:57 AM, kant kodali wrote: > >> Hi All, >> >> Does anybody got Spark running on Java 10? >> >> Thanks! >> >> >> >

Does Spark run on Java 10?

2018-04-01 Thread kant kodali
Hi All, Does anybody got Spark running on Java 10? Thanks!

Re: What do I need to set to see the number of records and processing time for each batch in SPARK UI?

2018-03-26 Thread kant kodali
For example in this blog <https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-apache-spark-streaming-applications.html> post. Looking at figure 1 and figure 2 I wonder What I need to do to see those graphs in spark 2.3.0? On Mon, Mar 26, 2018 at 7:10 AM, kant kodali

What do I need to set to see the number of records and processing time for each batch in SPARK UI?

2018-03-26 Thread kant kodali
Hi All, I am using spark 2.3.0 and I wondering what do I need to set to see the number of records and processing time for each batch in SPARK UI? The default UI doesn't seem to show this. Thanks@

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-23 Thread kant kodali
rnally if this is the case? Thanks! On Thu, Mar 22, 2018 at 7:48 AM, kant kodali wrote: > Thanks all! > > On Thu, Mar 22, 2018 at 2:08 AM, Jorge Machado wrote: > >> DataFrames are not mutable. >> >> Jorge Machado >> >> >> On 22 Mar 2018, at 10:07

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-22 Thread kant kodali
chain with "*Multiple Kafka Spark Streaming Dataframe Join query*" as > subject, TD and Chris has cleared my doubts, it would help you too. > > Thanks, > Aakash. > > On Thu, Mar 22, 2018 at 7:50 AM, kant kodali wrote: > >> Hi All, >> >> Is there a mutable da

Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-21 Thread kant kodali
Hi All, Is there a mutable dataframe spark structured streaming 2.3.0? I am currently reading from Kafka and if I cannot parse the messages that I get from Kafka I want to write them to say some "dead_queue" topic. I wonder what is the best way to do this? Thanks!

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread kant kodali
eric mechanism into the > Streaming DataSource V2 so that the engine can do admission control on the > amount of data returned in a source independent way. > > On Tue, Mar 20, 2018 at 2:58 PM, kant kodali wrote: > >> I am using spark 2.3.0 and Kafka 0.10.2.0 so I assume structure

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-20 Thread kant kodali
; may be what you’re looking for: > >- spark.streaming.backpressure.enabled >- spark.streaming.backpressure.initialRate >- spark.streaming.receiver.maxRate > - spark.streaming.kafka.maxRatePerPartition > > ​ > > On Mon, Mar 19, 2018 at 5:27 PM, kant kodali

Re: select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread kant kodali
trigger time, you will > notice warnings that it is 'falling behind' (I forget the exact verbiage, > but something to the effect of the calculation took XX time and is falling > behind). In that case, it will immediately check kafka for new messages and > begin processing the n

select count * doesnt seem to respect update mode in Kafka Structured Streaming?

2018-03-19 Thread kant kodali
Hi All, I have 10 million records in my Kafka and I am just trying to spark.sql(select count(*) from kafka_view). I am reading from kafka and writing to kafka. My writeStream is set to "update" mode and trigger interval of one second ( Trigger.ProcessingTime(1000)). I expect the counts to be prin

is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?

2018-03-16 Thread kant kodali
Hi All, is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1? Thanks, kant

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
Do I need to set SPARK_DIST_CLASSPATH or SPARK_CLASSPATH ? The latest version of spark (2.3) only has SPARK_CLASSPATH. On Wed, Mar 14, 2018 at 11:37 AM, kant kodali wrote: > Hi, > > I am not using emr. And yes I restarted several times. > > On Wed, Mar 14, 2018 at 6:35 AM, A

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
mazon.com/premiumsupport/knowledge- > center/restart-service-emr/ > > > > Femi > > > > *From: *kant kodali > *Date: *Wednesday, March 14, 2018 at 6:16 AM > *To: *Femi Anthony > *Cc: *vermanurag , "user @spark" < > user@spark.apache.org> > *Subjec

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
). My mapred-site.xml is empty. Do I even need this? if so, what should I set it to? On Wed, Mar 14, 2018 at 2:46 AM, Femi Anthony wrote: > What's the hardware configuration of the box you're running on i.e. how > much memory does it have ? > > Femi > > On Wed, Mar

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
by >> updating the following property: >> >> >> yarn.nodemanager.resource.memory-mb >> 8192 >> >> >> https://stackoverflow.com/questions/45687607/waiting-for-am- >> container-to-be-allocated-launched-and-register-with-rm >> >&

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
, Mar 14, 2018 at 12:12 AM, kant kodali wrote: > any idea? > > On Wed, Mar 14, 2018 at 12:12 AM, kant kodali wrote: > >> I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website >> <https://dwbi.org/etl/bigdata/183-setup-hadoop-cluster> and these are >&g

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
any idea? On Wed, Mar 14, 2018 at 12:12 AM, kant kodali wrote: > I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website > <https://dwbi.org/etl/bigdata/183-setup-hadoop-cluster> and these are the > only three files I changed Do I need to set or change anyth

Re: How to run spark shell using YARN

2018-03-14 Thread kant kodali
I set core-site.xml, hdfs-site.xml, yarn-site.xml as per this website and these are the only three files I changed Do I need to set or change anything in mapred-site.xml (As of now I have not touched mapred-site.xml)? when I do yarn -node -l

Re: How to run spark shell using YARN

2018-03-12 Thread kant kodali
Mon, Mar 12, 2018 at 4:42 PM, kant kodali wrote: > > Hi All, > > > > I am trying to use YARN for the very first time. I believe I configured > all > > the resource manager and name node fine. And then I run the below command > > > > ./spark-shell --master y

How to run spark shell using YARN

2018-03-12 Thread kant kodali
Hi All, I am trying to use YARN for the very first time. I believe I configured all the resource manager and name node fine. And then I run the below command ./spark-shell --master yarn --deploy-mode client *I get the below output and it hangs there forever *(I had been waiting over 10 minutes)

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-10 Thread kant kodali
uite a large range of use cases, including > multiple cascading joins. > > TD > > > > On Thu, Mar 8, 2018 at 9:18 AM, Gourav Sengupta > wrote: > >> super interesting. >> >> On Wed, Mar 7, 2018 at 11:44 AM, kant kodali wrote: >> >>

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-07 Thread kant kodali
append mode? Anyways the moment it is in master I am ready to test so JIRA tickets on this would help to keep track. please let me know. Thanks! On Tue, Mar 6, 2018 at 9:16 PM, kant kodali wrote: > Sorry I meant Spark 2.4 in my previous email > > On Tue, Mar 6, 2018 at 9:15 PM, kan

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-06 Thread kant kodali
Sorry I meant Spark 2.4 in my previous email On Tue, Mar 6, 2018 at 9:15 PM, kant kodali wrote: > Hi TD, > > I agree I think we are better off either with a full fix or no fix. I am > ok with the complete fix being available in master or some branch. I guess > the solution fo

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-06 Thread kant kodali
all but a small fraction of self-join > queries. That small fraction can produce incorrect results, and part 2 > avoids that. > > So for 2.3.1, we can enable self-joins by merging only part 1, but it can > give wrong results in some cases. I think that is strictly worse than no >

  1   2   3   4   5   >