DirectFileOutputCommitter in Spark 2.3.1

2018-09-19 Thread Priya Ch
Hello Team, I am trying to write a DataSet as parquet file in Append mode partitioned by few columns. However since the job is time consuming, I would like to enable DirectFileOutputCommitter (i.e by-passing the writes to temporary folder). Version of the spark i am using is 2.3.1. Can someone p

Video analytics on SPark

2016-09-09 Thread Priya Ch
Hi All, I have video surveillance data and this needs to be processed in Spark. I am going through the Spark + OpenCV. How to load .mp4 images into an RDD ? Can we directly do this or the video needs to be coverted to sequenceFile ? Thanks, Padma CH

Re: Send real-time alert using Spark

2016-07-12 Thread Priya Ch
wouldn't necessarily "use spark" to send the alert. Spark is in an > important sense one library among many. You can have your application use > any other library available for your language to send the alert. > > Marcin > > On Tue, Jul 12, 2016 at 9:25 AM, Priya Ch >

Send real-time alert using Spark

2016-07-12 Thread Priya Ch
Hi All, I am building Real-time Anomaly detection system where I am using k-means to detect anomaly. Now in-order to send alert to mobile or an email alert how do i send it using Spark itself ? Thanks, Padma CH

Re: Spark Task failure with File segment length as negative

2016-07-06 Thread Priya Ch
Is anyone resolved this ? Thanks, Padma CH On Wed, Jun 22, 2016 at 4:39 PM, Priya Ch wrote: > Hi All, > > I am running Spark Application with 1.8TB of data (which is stored in Hive > tables format). I am reading the data using HiveContect and processing it. > The cluster ha

Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch
Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each

Spark Job Execution halts during shuffle...

2016-05-26 Thread Priya Ch
Hello Team, I am trying to perform join 2 rdds where one is of size 800 MB and the other is 190 MB. During the join step, my job halts and I don't see progress in the execution. This is the message I see on console - INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output locations

Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Hi All, I have two RDDs A and B where in A is of size 30 MB and B is of size 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in cartesian operation ? I am using spark 1.6.0 version Regards, Padma Ch

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
which would convey the same. On Wed, Jan 6, 2016 at 8:19 PM, Annabel Melongo wrote: > Priya, > > It would be helpful if you put the entire trace log along with your code > to help determine the root cause of the error. > > Thanks > > > On Wednesday, Januar

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
f" on > one of the spark executors (perhaps run it in a for loop, writing the > output to separate files) until it fails and see which files are being > opened, if there's anything that seems to be taking up a clear majority > that might key you in on the culprit. > > O

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
iles" > exception. > > > On Tuesday, January 5, 2016 8:03 AM, Priya Ch < > learnings.chitt...@gmail.com> wrote: > > > Can some one throw light on this ? > > Regards, > Padma Ch > > On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch > wrote: > > Chris

Re: passing SparkContext as parameter

2015-09-21 Thread Priya Ch
ep 21, 2015 at 3:06 PM, Petr Novak wrote: > add @transient? > > On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch > wrote: > >> Hello All, >> >> How can i pass sparkContext as a parameter to a method in an object. >> Because passing sparkContext is giving me Ta

Re: Spark Streaming..Exception

2015-09-14 Thread Priya Ch
; true. What is the possible solution for this ? Is this a bug in Spark 1.3.0? Changing the scheduling mode to Stand-alone or Mesos mode would work fine ?? Please someone share your views on this. On Sat, Sep 12, 2015 at 11:04 PM, Priya Ch wrote: > Hello All, > > When I push messages into

Spark Streaming..Exception

2015-09-12 Thread Priya Ch
Hello All, When I push messages into kafka and read into streaming application, I see the following exception- I am running the application on YARN and no where broadcasting the message within the application. Just simply reading message, parsing it and populating fields in a class and then prin

Fwd: Writing streaming data to cassandra creates duplicates

2015-08-04 Thread Priya Ch
combine the messages with the same primary key. > > Hope that helps. > > Greetings, > > Juan > > > 2015-07-30 10:50 GMT+02:00 Priya Ch : > >> Hi All, >> >> Can someone throw insights on this ? >> >> On Wed, Jul 29, 2015 at 8:29 AM, Priya

Re: Writing streaming data to cassandra creates duplicates

2015-07-30 Thread Priya Ch
Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch wrote: > > > Hi TD, > > Thanks for the info. I have the scenario like this. > > I am reading the data from kafka topic. Let's say kafka has 3 partitions > for the topic. I

Fwd: Writing streaming data to cassandra creates duplicates

2015-07-28 Thread Priya Ch
s will guard against multiple attempts to > run the task that inserts into Cassandra. > > See > http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations > > TD > > On Sun, Jul 26, 2015 at 11:19 AM, Priya Ch > wrote: > >>

Writing streaming data to cassandra creates duplicates

2015-07-26 Thread Priya Ch
Hi All, I have a problem when writing streaming data to cassandra. Or existing product is on Oracle DB in which while wrtiting data, locks are maintained such that duplicates in the DB are avoided. But as spark has parallel processing architecture, if more than 1 thread is trying to write same d

Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-04 Thread Priya Ch
Hi All, I configured Kafka cluster on a single node and I have streaming application which reads data from kafka topic using KafkaUtils. When I execute the code in local mode from the IDE, the application runs fine. But when I submit the same to spark cluster in standalone mode, I end up with

Spark exception when sending message to akka actor

2014-12-22 Thread Priya Ch
Hi All, I have akka remote actors running on 2 nodes. I submitted spark application from node1. In the spark code, in one of the rdd, i am sending message to actor running on node1. My Spark code is as follows: class ActorClient extends Actor with Serializable { import context._ val curre

1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-14 Thread Priya Ch
Hi All, We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set up hdfs which has 2 TB capacity and the block size is 256 mb When we try to process 1 gb file on spark, we see the following exception 14/11/

Default spark.deploy.recoveryMode

2014-10-14 Thread Priya Ch
Hi Spark users/experts, In Spark source code (Master.scala & Worker.scala), when registering the worker with master, I see the usage of *persistenceEngine*. When we don't specify spark.deploy.recovery mode explicitly, what is the default value used ? This recovery mode is used to persists and re

Fwd: Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
the classpath? In Spark >> 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze >> version you used is different from the one comes with Spark, you might >> see class not found. -Xiangrui >> >> On Fri, Oct 3, 2014 at 4:22 AM, Priya Ch >> wrote: >&

Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
Hi Team, When I am trying to use DenseMatrix of breeze library in spark, its throwing me the following error: java.lang.noclassdeffounderror: breeze/storage/Zero Can someone help me on this ? Thanks, Padma Ch

spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Priya Ch
Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /

Subscription request for developer community

2014-06-12 Thread Priya Ch
Please accept the request