to surface the problem.
Can someone review the code and tell me if I am doing something wrong?
regards
Sunita
for failed tasks were done, other tasks completed.
You can set it to higher or lower value depending on how many more tasks
you have and the duration they take to complete.
regards
Sunita
On Fri, Nov 13, 2015 at 4:50 PM, Ted Yu wrote:
> I searched the code base and looked at:
> https://spark
(UnsupportedOperationChecker.scala:297)
regards
Sunita
On Mon, Sep 18, 2017 at 10:15 AM, Michael Armbrust
wrote:
> You specify the schema when loading a dataframe by calling
> spark.read.schema(...)...
>
> On Tue, Sep 12, 2017 at 4:50 PM, Sunita Arvind
> wrote:
>
>> Hi Micha
usecase.
Is there a way to change the owner of files written by Spark?
regards
Sunita
> Le 13 sept. 2017 01:51, "Sunita Arvind" a écrit :
>
> Hi Michael,
>
> I am wondering what I am doing wrong. I get error like:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Schema
> must be specified when creating a streaming source D
spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:278)
at
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:282)
at
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:222)
While running on the EMR cluster all paths p
Thanks for your response Praneeth. We did consider Kafka however cost was
the only hold back factor as we might need a larger cluster and existing
cluster is on premise and my app is on cloud. So the same cluster cannot be
used.
But I agree it does sound like a good alternative.
Regards
Sunita
Thanks for your response Michael
Will try it out.
Regards
Sunita
On Wed, Aug 23, 2017 at 2:30 PM Michael Armbrust
wrote:
> If you use structured streaming and the file sink, you can have a
> subsequent stream read using the file source. This will maintain exactly
> once processin
to be error prone. When either of the jobs
get delayed due to bursts or any error/exception this could lead to huge
data losses and non-deterministic behavior . What are other alternatives to
this?
Appreciate any guidance in this regard.
regards
Sunita Koppar
parquet with null in the numeric fields.
Is there a workaround to it? I need to be able to allow null values
for numeric fields
Thanks in advance.
regards
Sunita
re I am not doing an overkill or overseeing a
potential issue.
regards
Sunita
On Tue, Oct 25, 2016 at 2:38 PM, Sunita Arvind
wrote:
> The error in the file I just shared is here:
>
> val partitionOffsetPath:String = topicDirs.consumerOffsetDir + "/" +
> partition._2(0); -
Thanks for the response Sean. I have seen the NPE on similar issues very
consistently and assumed that could be the reason :) Thanks for clarifying.
regards
Sunita
On Tue, Oct 25, 2016 at 10:11 PM, Sean Owen wrote:
> This usage is fine, because you are only using the HiveContext locally
u can create the dataframe in main, you can register it as a table and
run the queries in main method itself. You don't need to coalesce or run
the method within foreach.
Regards
Sunita
On Tuesday, October 25, 2016, Ajay Chander wrote:
>
> Jeff, Thanks for your response. I see below e
eeper")
df.saveAsParquetFile(conf.getString("ParquetOutputPath")+offsetSaved)
LogHandler.log.info("Created the parquet file")
}
Thanks
Sunita
On Tue, Oct 25, 2016 at 2:11 PM, Sunita Arvind
wrote:
> Attached is the edited code. Am I heading in right direc
Sunita
On Tue, Oct 25, 2016 at 1:52 PM, Sunita Arvind
wrote:
> Thanks for confirming Cody.
> To get to use the library, I had to do:
>
> val offsetsStore = new ZooKeeperOffsetsStore(conf.getString("zkHosts"),
> "/consumers/topics/"+ topics + "/0")
>
nt the library to pick all the partitions for a topic, without me
specifying the path, is it possible out of the box or I need to tweak?
regards
Sunita
On Tue, Oct 25, 2016 at 12:08 PM, Cody Koeninger wrote:
> You are correct that you shouldn't have to worry about broker id.
>
> I'm
Just re-read the kafka architecture. Something that slipped my mind is, it
is leader based. So topic/partitionId pair will be same on all the brokers.
So we do not need to consider brokerid while storing offsets. Still
exploring rest of the items.
regards
Sunita
On Tue, Oct 25, 2016 at 11:09 AM
are not considering brokerIds which
storing offsets and probably the OffsetRanges does not have it either. It
can only provide Topic, partition, from and until offsets.
I am probably missing something very basic. Probably the library works well
by itself. Can someone/ Cody explain?
Cody, Thanks a lot f
Hello Experts,
Is there a way to get spark to write to elasticsearch asynchronously?
Below are the details
http://stackoverflow.com/questions/39624538/spark-savetoes-asynchronously
regards
Sunita
interesting
observation is, bringing down the executor memory to 5GB with executor
memoryOverhead to 768 showed significant performance gains. What are the
other associated settings?
regards
Sunita
Thank you for your inputs. Will test it out and share my findings
On Thursday, July 14, 2016, CosminC wrote:
> Didn't have the time to investigate much further, but the one thing that
> popped out is that partitioning was no longer working on 1.6.1. This would
> definitely explain the 2x perfo
I am facing the same issue. Upgrading to Spark1.6 is causing hugh performance
loss. Could you solve this issue? I am also attempting memory settings as
mentioned
http://spark.apache.org/docs/latest/configuration.html#memory-management
But its not making a lot of difference. Appreciate your inputs
also trying to figure out if I can use the
(iterator: Iterator[(K, Seq[V], Option[S])]) but haven't figured it out yet.
Appreciate any suggestions in this regard.
regards
Sunita
P.S:
I am aware of mapwithState but not on the latest version as of now.
distribution data sets. Mentioning it here for
benefit of anyone else stumbling upon the same issue.
regards
Sunita
On Wed, Jun 22, 2016 at 8:20 PM, Sunita Arvind
wrote:
> Hello Experts,
>
> I am getting this error repeatedly:
>
> 16/06/23 03:06:59 ERROR streaming.StreamingContext:
r.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 38 more
16/06/23 11:09:53 INFO SparkContext: Invoking stop() from shutdown hook
I've tried kafka version 0.8.2.0, 0.8.2.2, 0.9.0.0. With 0.9.0.0 the
processing hangs much sooner.
Can someone help with this error?
reg
c.awaitTermination()
}
}
}
I also tried putting all the initialization directly in main (not using
method calls for initializeSpark and createDataStreamFromKafka) and also
not putting in foreach and creating a single spark and streaming context.
However, the error persists.
Appreciate any help.
regards
Sunita
do I need to have HiveContext in order to see
the tables registered via Spark application through the JDBC?
regards
Sunita
Thanks for the clarification Michael and good luck with Spark 2.0. It
really looks promising.
I am especially interested in adhoc queries aspect. Probably that is what
is being referred to as Continuous SQL in the slides. What is the timeframe
for availability this functionality?
regards
Sunita
2.1 or later only
regards
Sunita
On Fri, May 6, 2016 at 1:06 PM, Michael Malak
wrote:
> At first glance, it looks like the only streaming data sources available
> out of the box from the github master branch are
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/o
ensure it works for our use cases.
Can someone point me to relevant material for this.
regards
Sunita
ed:
date1 is 2005-07-18 00:00:00 format is
org.joda.time.format.DateTimeFormatter@28d101f3
date2 is 20150719 format is org.joda.time.format.DateTimeFormatter@5e411af2
Within 10 years
FromDT =2005-07-18 00:00:00ToDT =20150719within10years =true actual number
of years i
e("rdd1.key".attr === "rdd2.key".attr))
-
DSL Style execution plan --> HashOuterJoin [education#18],
[i1_education_cust_demo#29], LeftOuter, None
Exchange (HashPartitioning [educ
ot of effort for us to try this approach and weight the
performance as we need to register the output as tables to proceed using
them. Hence would appreciate inputs from the community before proceeding.
Regards
Sunita Koppar
I was able to resolve this by adding rdd.collect() after every stage. This
enforced RDD evaluation and helped avoid the choke point.
regards
Sunita Kopppar
On Sat, Jan 17, 2015 at 12:56 PM, Sunita Arvind
wrote:
> Hi,
>
> My spark jobs suddenly started getting hung and here is the debu
names. The spark sql wiki has good examples for this. Looks more
easy to manage to me than your solution below.
Agree with you on the fact that when there are lot of columns,
row.getString() even once is not convenient
Regards
Sunita
On Tuesday, January 20, 2015, Night Wolf wrote:
> In Spark
ot;)
sparkConf.set("spark.driver.memory","512m")
sparkConf.set("spark.executor.memory","1g")
sparkConf.set("spark.driver.maxResultSize","1g")
Please note. In eclipse as well as sbt> the program kept throwing
StackOverflow. Increasing Xss to 5 MB eliminated the problem,
Could this be something unrelated to memory? The SchemaRDDs have close to
400 columns and hence I am using StructType(StructField) and performing
applySchema.
My code cannot be shared right now. If required, I will edit it and post.
regards
Sunita
apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94)
at croevss.StageJoin$.vsswf(StageJoin.scala:162)
at croevss.StageJoin$.main(StageJoin.scala:41)
at croevss.StageJoin.main(StageJoin.scala)
regards
Sunita Koppar
trapper.scala)
regards
Sunita
On Tue, Nov 25, 2014 at 11:47 PM, Sameer Farooqui
wrote:
> Hi Sunita,
>
> This gitbook may also be useful for you to get Spark running in local mode
> on your Windows machine:
> http://blueplastic.gitbooks.io/how-to-light-your-spark-on-a-stick/content/
&g
ine.
Appreciate your help.
regards
Sunita
Thanks for the clarification Ankur
Appreciate it.
Regards
Sunita
On Monday, August 25, 2014, Ankur Dave wrote:
> At 2014-08-25 11:23:37 -0700, Sunita Arvind > wrote:
> > Does this "We introduce GraphX, which combines the advantages of both
> > data-parallel and gr
fault-tolerance." mean that
GraphX makes the typical RDBMS operations possible even when the data is
persisted in a GDBMS and not viceversa?
regards
Sunita
/get-started-with-spark-deploy-spark-server-and-compute-pi-from-your-web-browser/
>
> Romain
>
>
>
> Romain
>
>
> On Tue, Jun 24, 2014 at 9:04 AM, Sunita Arvind > wrote:
>
>> Hello Experts,
>>
>> I am attempting to integrate Spark Editor with
Hello Experts,
I am attempting to integrate Spark Editor with Hue on CDH5.0.1. I have the
spark installation build manually from the sources for spark1.0.0. I am
able to integrate this with cloudera manager.
Background:
---
We have a 3 node VM cluster with CDH5.0.1
We requried spa
43 matches
Mail list logo