Got it..thnx Reynold..
On 20 Sep 2015 07:08, "Reynold Xin" wrote:
> The RDDs themselves are not materialized, but the implementations can
> materialize.
>
> E.g. in cogroup (which is used by RDD.join), it materializes all the data
> during grouping.
>
> In SQL/DataFrame join, depending on the joi
I think generally the way forward would be to put aggregate statistics to
an external storage (eg hbase) - it should not have that much influence on
latency. You will probably need it anyway if you need to store historical
information. Wrt to deltas - always a tricky topic. You may want to work
wit
Hi Thuy,
You can check Rdd.lookup(). It requires the rdd is partitioned, and of
course, cached in memory. Or you may consider a distributed cache like
ehcache, aws elastic cache.
I think an external storage is an option, too. Especially nosql databases,
they can handle updates at high speed, at c
I allocated almost 6GB of RAM to the ubuntu virtual machine and got the
same problem.
I will go over this post and try to zoom in into the java vm settings.
meanwhile - can someone with a working ubuntu machine can specify her JVM
settings?
Thanks,
Eyal
On Sat, Sep 19, 2015 at 7:49 PM, Ted Yu w
Hi,
I am trying to build a data generator that feeds a streaming application.
This data generator just reads a file and send its lines through a socket.
I get no errors on the logs, and the benchmark bellow always prints
"Received 0 records". Am I doing something wrong?
object MyDataGenerator {
Thanks Adrian and Jorn for the answers.
Yes, you're right there are lot of things I need to consider if I want to
use Spark for my app.
I still have few concerns/questions from your information:
1/ I need to combine trading stream with tick stream, I am planning to use
Kafka for that
If I am usi
Hi Richard,
I am not sure how to support user-defined type. But regarding your second
question, you can have a walkaround as following.
Suppose you have a struct a, and want to filter a.c with a.c > X. You can
define a alias C as a.c, and add extra column C to the schema of the relation,
and
The RDDs themselves are not materialized, but the implementations can
materialize.
E.g. in cogroup (which is used by RDD.join), it materializes all the data
during grouping.
In SQL/DataFrame join, depending on the join:
1. For broadcast join, only the smaller side is materialized in memory as a
I defined my own relation (extending BaseRelation) and implemented the
PrunedFilteredScan interface, but discovered that if the column referenced
in a WHERE = clause is a user-defined type or a field of a struct column,
then Spark SQL passes NO filters to the PrunedFilteredScan.buildScan
method, re
Hi All,
figured it out for got mention local as loca[2] , at least two node
required.
package com.examples
/**
* Created by kalit_000 on 19/09/2015.
*/
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
imp
Hi All,
I am unable to see the output getting printed in the console can anyone
help.
package com.examples
/**
* Created by kalit_000 on 19/09/2015.
*/
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
impo
Hi ,
I am trying to develop in intellij Idea same code I am having the same issue
is there any work around.
Error in intellij:- cannot resolve symbol createDirectStream
import kafka.serializer.StringDecoder
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.
Hi Reynold,
Can you please elaborate on this. I thought RDD also opens only an
iterator. Does it get materialized for joins?
Rishi
On Saturday, September 19, 2015, Reynold Xin wrote:
> Yes for RDD -- both are materialized. No for DataFrame/SQL - one side
> streams.
>
>
> On Thu, Sep 17, 2015 at
You can still provide properties through the docker container by putting
configuration in the conf directory, but we try to pass all properties
submitted from the driver spark-submit through which I believe will override
the defaults.
This is not what you are seeing?
Tim
> On Sep 19, 2015, a
Please read this article:
http://blogs.vmware.com/apps/2011/06/taking-a-closer-look-at-sizing-the-java-process.html
Can you increase the memory given to the ubuntu virtual machine ?
Cheers
On Sat, Sep 19, 2015 at 9:30 AM, Eyal Altshuler
wrote:
> Hi,
>
> I allocate 4GB for the ubuntu virtual ma
Hi,
I allocate 4GB for the ubuntu virtual machine, how to check what is the
maximal available for a jvm process?
Regarding the thread - I see it's related to building on windows.
Thanks,
Eyal
On Sat, Sep 19, 2015 at 6:54 PM, Ted Yu wrote:
> See also this thread:
>
> https://bukkit.org/threads/
Using scala API, you can first group by user and then use combineByKey.
Thanks,
Aniket
On Sat, Sep 19, 2015, 6:41 PM kali.tumm...@gmail.com
wrote:
> Hi All,
> I would like to achieve this below output using spark , I managed to write
> in Hive and call it in spark but not in just spark (scala),
The assumption that the executor has no default properties set in it's
environment through the docker container. Correct me if I'm wrong, but any
properties which are unset in the SparkContext will come from the
environment of the executor will it not?
Thanks,
- Alan
On Sat, Sep 19, 2015 at 1:09
See also this thread:
https://bukkit.org/threads/complex-craftbukkit-server-and-java-problem-could-not-reserve-enough-space-for-object-heap.155192/
Cheers
On Sat, Sep 19, 2015 at 8:51 AM, Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:
> Hi Eval
>
> Can you check if your Ubuntu VM has enou
Hi Eval
Can you check if your Ubuntu VM has enough RAM allocated to run JVM of size
3gb?
thanks,
Aniket
On Sat, Sep 19, 2015, 9:09 PM Eyal Altshuler
wrote:
> Hi,
>
> I had configured the MAVEN_OPTS environment variable the same as you wrote.
> My java version is 1.7.0_75.
> I didn't customized
Hi,
I had configured the MAVEN_OPTS environment variable the same as you wrote.
My java version is 1.7.0_75.
I didn't customized the JVM heap size specifically. Is there an additional
configuration I have to run besides the MAVEN_OPTS configutaion?
Thanks,
Eyal
On Sat, Sep 19, 2015 at 5:29 PM, T
Can you tell us how you configured the JVM heap size ?
Which version of Java are you using ?
When I build Spark, I do the following:
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m"
Cheers
On Sat, Sep 19, 2015 at 5:31 AM, Eyal Altshuler
wrote:
> Hi,
> Trying to b
Hi All,
I would like to achieve this below output using spark , I managed to write
in Hive and call it in spark but not in just spark (scala), how to group
word counts on particular user (column) for example.
Imagine users and their given tweets I want to do word count based on user
name.
Input:-
I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and
just found you CAN run it this way. Are there any user posts, blog posts,
etc on why and how you'd do this?
Basically, at first I was questioning why you'd run spark in a docker
container, i.e., if you run with tar balled e
Hi,
Trying to build spark in my ubuntu virtual machine, I am getting the
following error:
"Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit".
I have
If you want to be able to let your users query their portfolio then you may
want to think about storing the current state of the portfolios in
hbase/phoenix or alternatively a cluster of relationaldatabases can make
sense. For the rest you may use Spark.
Le sam. 19 sept. 2015 à 4:43, Thúy Hằng Lê
I guess I need a bit more clarification, what kind of assumptions was the
dispatcher making?
Tim
On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite
wrote:
> Hi Tim,
>
> Thanks for the follow up. It's not so much that I expect the executor to
> inherit the configuration of the dispatcher as I*
yarn-client still runs the executor tasks on the cluster, the main difference
is where the driver job runs.
Thanks,
Ewan
-- Original message--
From: shahab
Date: Fri, 18 Sep 2015 13:11
To: Aniket Bhatnagar;
Cc: user@spark.apache.org;
Subject:Re: Zeppelin on Yarn : org.apache.spar
28 matches
Mail list logo