from:"Thakrar, Jayesh"

Re: jdbc spark streaming

2018-12-28 Thread Thakrar, Jayesh

Yes, you can certainly use spark streaming, but reading from the original source table may still be time consuming and resource intensive. Having some context on the RDBMS platform, data size/volumes involved and the tolerable lag (between changes being created and it being processed by Spark)

Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-21 Thread Thakrar, Jayesh

Just curious - is this HttpSink your own custom sink or Dropwizard configuration? If your own custom code, I would suggest looking/trying out the Dropwizard. See http://spark.apache.org/docs/latest/monitoring.html#metrics https://metrics.dropwizard.io/4.0.0/ Also, from what I know, the metrics

Re: How to track batch jobs in spark ?

2018-12-06 Thread Thakrar, Jayesh

See if https://spark.apache.org/docs/latest/monitoring.html helps. Essentially whether you are running an app as spark-shell, via spark-submit (local, Spark-Cluster, YARN, Kubernetes, mesos), the driver will provide a UI on port 4040. You can monitor via the UI and via a REST API E.g. running

Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Thakrar, Jayesh

still excessive. From: Vitaliy Pisarev Date: Thursday, November 15, 2018 at 1:58 PM To: "Thakrar, Jayesh" Cc: Shahbaz , user , David Markovitz Subject: Re: How to address seemingly low core utilization on a spark workload? Small update, my initial estimate was incorrect. I have on

Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Thakrar, Jayesh

save. From: Vitaliy Pisarev Date: Thursday, November 15, 2018 at 1:03 PM To: Shahbaz Cc: "Thakrar, Jayesh" , user , "dudu.markov...@microsoft.com" Subject: Re: How to address seemingly low core utilization on a spark workload? Agree, and I will try it. One clarification t

Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Thakrar, Jayesh

ittle work. Question is what can I do about it. On Thu, Nov 15, 2018 at 5:29 PM Thakrar, Jayesh mailto:jthak...@conversantmedia.com>> wrote: Can you shed more light on what kind of processing you are doing? One common pattern that I have seen for active core/executor utilization dropping to zero

Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Thakrar, Jayesh

Can you shed more light on what kind of processing you are doing? One common pattern that I have seen for active core/executor utilization dropping to zero is while reading ORC data and the driver seems (I think) to be doing schema validation. In my case I would have hundreds of thousands of ORC

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-28 Thread Thakrar, Jayesh

Not sure I get what you mean…. I ran the query that you had – and don’t get the same hash as you. From: Gokula Krishnan D Date: Friday, September 28, 2018 at 10:40 AM To: "Thakrar, Jayesh" Cc: user Subject: Re: [Spark SQL] why spark sql hash() are returns the same hash value thoug

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-26 Thread Thakrar, Jayesh

Cannot reproduce your situation. Can you share Spark version? Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92) Type in

Re: [PySpark] Releasing memory after a spark job is finished

2018-06-04 Thread Thakrar, Jayesh

Disclaimer - I use Spark with Scala and not Python. But I am guessing that Jorn's reference to modularization is to ensure that you do the processing inside methods/functions and call those methods sequentially. I believe that as long as an RDD/dataset variable is in scope, its memory may not be

Re: How to skip nonexistent file when read files with spark?

2018-05-21 Thread Thakrar, Jayesh

21, 2018 at 10:20 PM To: ayan guha Cc: "Thakrar, Jayesh" , user Subject: Re: How to skip nonexistent file when read files with spark? Thanks ayan, Also I have tried this method, the most tricky thing is that dataframe union method must be based on same structure schema, while on my

Re: How to skip nonexistent file when read files with spark?

2018-05-21 Thread Thakrar, Jayesh

Probably you can do some preprocessing/checking of the paths before you attempt to read it via Spark. Whether it is local or hdfs filesystem, you can try to check for existence and other details by using the "FileSystem.globStatus" method from the Hadoop API. From: JF Chen Date: Sunday, May 20,

Re: Spark Monitoring using Jolokia

2018-01-08 Thread Thakrar, Jayesh

And here's some more info on Spark Metrics https://www.slideshare.net/JayeshThakrar/apache-bigdata2017sparkprofiling From: Maximiliano Felice Date: Monday, January 8, 2018 at 8:14 AM To: Irtiza Ali Cc: Subject: Re: Spark Monitoring using Jolokia Hi! I don't know very much about them, but I'

Re: Access to Applications metrics

2017-12-05 Thread Thakrar, Jayesh

You can also get the metrics from the Spark application events log file. See https://www.slideshare.net/JayeshThakrar/apache-bigdata2017sparkprofiling From: "Qiao, Richard" Date: Monday, December 4, 2017 at 6:09 PM To: Nick Dimiduk , "user@spark.apache.org" Subject: Re: Access to Applications

Re: Why don't I see my spark jobs running in parallel in Cassandra/Spark DSE cluster?

2017-10-27 Thread Thakrar, Jayesh

What you have is sequential and hence sequential processing. Also Spark/Scala are not parallel programming languages. But even if they were, statements are executed sequentially unless you exploit the parallel/concurrent execution features. Anyway, see if this works: val (RDD1, RDD2) = (JavaFunc

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-12 Thread Thakrar, Jayesh

Could this be due to https://issues.apache.org/jira/browse/HIVE-6 ? From: Patrik Medvedev Date: Monday, June 12, 2017 at 2:31 AM To: Jörn Franke , vaquar khan Cc: Jean Georges Perrin , User Subject: Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC Hello, All secu

Re: spark-submit config via file

2017-03-27 Thread Thakrar, Jayesh

Roy - can you check if you have HADOOP_CONF_DIR and YARN_CONF_DIR set to the directory containing the HDFS and YARN configuration files? From: Sandeep Nemuri Date: Monday, March 27, 2017 at 9:44 AM To: Saisai Shao Cc: Yong Zhang , ", Roy" , user Subject: Re: spark-submit config via file You

Re: Lost executor 4 Container killed by YARN for exceeding memory limits.

2017-02-14 Thread Thakrar, Jayesh

se the old memory management, you may explicitly enable `spark.memory.useLegacyMode` (not recommended). On Mon, Feb 13, 2017 at 11:23 PM, Thakrar, Jayesh mailto:jthak...@conversantmedia.com>> wrote: Nancy, As your log output indicated, your executor 11 GB memory limit. While you might wan

Re: Lost executor 4 Container killed by YARN for exceeding memory limits.

2017-02-13 Thread Thakrar, Jayesh

Nancy, As your log output indicated, your executor 11 GB memory limit. While you might want to address the root cause/data volume as suggested by Jon, you can do an immediate test by changing your command as follows spark-shell --master yarn --deploy-mode client --driver-memory 16G --num-execut

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Thakrar, Jayesh

Ben, Also look at Phoenix (Apache project) which provides a better (one of the best) SQL/JDBC layer on top of HBase. http://phoenix.apache.org/ Cheers, Jayesh From: vincent gromakowski Date: Monday, October 17, 2016 at 1:53 PM To: Benjamin Kim Cc: Michael Segel , Jörn Franke , Mich Talebzad

Re: Is spark a right tool for updating a dataframe repeatedly

2016-10-17 Thread Thakrar, Jayesh

Yes, iterating over a dataframe and making changes is not uncommon. Ofcourse RDDs, dataframes and datasets are immultable, but there is some optimization in the optimizer that can potentially help to dampen the effect/impact of creating a new rdd, df or ds. Also, the use-case you cited is similar

Re: jdbc spark streaming

Re: Custom Metric Sink on Executor Always ClassNotFound

Re: How to track batch jobs in spark ?

Re: How to address seemingly low core utilization on a spark workload?

Re: How to address seemingly low core utilization on a spark workload?

Re: How to address seemingly low core utilization on a spark workload?

Re: How to address seemingly low core utilization on a spark workload?

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

Re: [PySpark] Releasing memory after a spark job is finished

Re: How to skip nonexistent file when read files with spark?

Re: How to skip nonexistent file when read files with spark?

Re: Spark Monitoring using Jolokia

Re: Access to Applications metrics

Re: Why don't I see my spark jobs running in parallel in Cassandra/Spark DSE cluster?

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

Re: spark-submit config via file

Re: Lost executor 4 Container killed by YARN for exceeding memory limits.

Re: Lost executor 4 Container killed by YARN for exceeding memory limits.

Re: Spark SQL Thriftserver with HBase

Re: Is spark a right tool for updating a dataframe repeatedly

21 matches

Site Navigation

Mail list logo

Footer information