Hi Users,
I am new Spark I have written flow.When we deployed our code it is
completing jobs in 4-5 min. But now it is taking 20+ min in completing with
almost same set of data. Can you please help me to figure out reason for it.
--
Thanks and Regards,
Saurav Sinha
Contact: 9742879062
--
Th
Hi Users,
I am new Spark I have written flow.When we deployed our code it is
completing jobs in 4-5 min. But now it is taking 20+ min in completing with
almost same set of data. Can you please help me to figure out reason for it.
--
Thanks and Regards,
Saurav Sinha
Contact: 9742879062
--
T
Sbt asked for a bigger initial heap than the host had space for. It is a
JVM error you can and should search for first. You will need more memory.
On Mon, Sep 21, 2015, 2:11 AM Aaroncq4 <475715...@qq.com> wrote:
> When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a
> probl
Unless I am mistaken, in a group by operation, it spills to disk in case
values for a key don't fit in memory.
Thanks,
Aniket
On Mon, Sep 21, 2015 at 10:43 AM Huy Banh wrote:
> Hi,
>
> If your input format is user -> comment, then you could:
>
> val comments = sc.parallelize(List(("u1", "one tw
You likely need to add the Cassandra connector JAR to spark.jars so it is
available to the executors.
Hope this helps,
Sim
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Class-cast-exception-Spark-1-5-tp24732p24753.html
Sent from the Apache Spark User List
Anyone?
On Sun, Sep 20, 2015 at 7:49 AM, Eyal Altshuler
wrote:
> I allocated almost 6GB of RAM to the ubuntu virtual machine and got the
> same problem.
> I will go over this post and try to zoom in into the java vm settings.
>
> meanwhile - can someone with a working ubuntu machine can specify
Hi,
What is the purpose of the taskBinary for a ShuffleMapTask? What does it
contain and how is it useful? Is it the representation of all the RDD
operations that will be applied for the partition that task will be
processing? (in the case below the task will process stage 0, partition 0)
If it is
Hi,
If your input format is user -> comment, then you could:
val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three
four three")))
val wordCounts = comments.
flatMap({case (user, comment) =>
for (word <- comment.split(" ")) yield(((user, word), 1)) }).
reduceByKey(_
Have you seen this thread:
http://search-hadoop.com/m/q3RTtVJJ3I15OJ251
Cheers
On Sun, Sep 20, 2015 at 6:11 PM, Aaroncq4 <475715...@qq.com> wrote:
> When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a
> problem and I did not know why.It signs that:
>
> NOTE: The sbt/sbt s
When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a
problem and I did not know why.It signs that:
NOTE: The sbt/sbt script has been relocated to build/sbt.
Please update references to point to the new location.
Invoking 'build/sbt assembly' now ...
Using /usr/
We do - but I don't think it is feasible to duplicate every single
algorithm in DF and in RDD.
The only way for this to work is to make one underlying implementation work
for both. Right now DataFrame knows how to serialize individual elements
well and can manage memory that way -- the RDD API doe
sorry that was a typo. i meant to say:
why do we have these features (broadcast join and sort-merge join) in
DataFrame but not in RDD?
they don't seem specific to structured data analysis to me.
thanks! koert
On Sun, Sep 20, 2015 at 2:46 PM, Koert Kuipers wrote:
> why dont we want these (broa
why dont we want these (broadcast join and sort-merge join) in DataFrame
but not in RDD?
they dont seem specific to structured data analysis to me.
On Sun, Sep 20, 2015 at 2:41 AM, Rishitesh Mishra
wrote:
> Got it..thnx Reynold..
> On 20 Sep 2015 07:08, "Reynold Xin" wrote:
>
>> The RDDs thems
Thanks all,
Using external storage seems to be the best solution for now.
Btw, have any one heard about following spark streaming module from Intel?
https://github.com/Intel-bigdata/spark-streamingsql
Seems it allow us to query on Spark stream on the fly, however it haven't
updated for 9 months,
Having to restructure my queries isn't a very satisfactory solution,
unfortunately.
I did notice that if I implement the CatalystScan interface instead, then
the filters DO get passed in, but the column identifiers would need to be
translated somewhat to be usable, so that's another option. Unfortu
val topics="first"
shouldn't it be val topics = Set("first") ?
On Sun, Sep 20, 2015 at 1:01 PM, Petr Novak wrote:
> val topics="first"
>
> shouldn't it be val topics = Set("first") ?
>
> On Sat, Sep 19, 2015 at 10:07 PM, kali.tumm...@gmail.com <
> kali.tumm...@gmail.com> wrote:
>
>> Hi ,
>>
>>
Hi Michal,
yes, it is there logged twice, it can be seen in attached log in one of
previous post with more details:
15/09/17 23:06:37 INFO StreamingContext: Invoking
stop(stopGracefully=false) from shutdown hook
15/09/17 23:06:37 INFO StreamingContext: Invoking
stop(stopGracefully=false) from shut
17 matches
Mail list logo