I run pyspark streaming example queue_streaming.py. But run into the
following error, does anyone know what might be wrong ? Thanks
ERROR [2017-08-02 08:29:20,023] ({Stop-StreamingContext}
Logging.scala[logError]:91) - Cannot connect to Python process. It's
probably dead. Stopping StreamingContext
DataSet.describe only calculate the statistics for numerical data, but not
for categorical column. R's summary method can also calculate statistical
for numerical data which is very useful for exploratory data analysis. Just
wondering is there any api for categorical column statistics as well or is
e.g. I have a custom class A (not case class), and I'd like to use it as
DataSet[A]. I guess I need to implement Encoder for this, but didn't find
any example for that, is there any document for that ? Thanks
Here's my screenshot, the stage 19 and 20 is one-to-one relationship.
They're the only child/parent. From my understanding, the shuffle write of
stage 19 should be the same as shuffle read of stage 20, but here they are
a little difference. Is there any reason for it ? Thanks.
[image: Inline imag
Runner @
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
>
> On Tue, Oct 13, 2015 at 7:50 PM, canan chen wrote:
>
>> I look at the source code of spark, but didn't find where python program
>> is started in pytho
I look at the source code of spark, but didn't find where python program is
started in python.
It seems spark-submit will call PythonGatewayServer, but where is python
program started ?
Thanks
, 2015 at 10:39 PM, canan chen wrote:
> Yes, I follow the guide in this doc, and run it as mesos client mode
>
> On Tue, Sep 8, 2015 at 6:31 PM, Akhil Das
> wrote:
>
>> In which mode are you submitting your application? (coarse-grained or
>> fine-grained(default)).
pache.org/docs/latest/running-on-mesos.html#using-a-mesos-master-url
>
> Thanks
> Best Regards
>
> On Tue, Sep 8, 2015 at 12:54 PM, canan chen wrote:
>
>> Hi all,
>>
>> I try to run spark on mesos, but it looks like I can not allocate
>> resources from mesos. I
Hi all,
I try to run spark on mesos, but it looks like I can not allocate resources
from mesos. I am not expert of mesos, but from the mesos log, it seems
spark always decline the offer from mesos. Not sure what's wrong, maybe
need some configuration change. Here's the mesos master log
I0908 15:0
k/tree/master/core/src/main/scala/org/apache/spark/deploy/rest
> ), current I don't think there's a document address this part, also this
> rest api is only used for SparkSubmit currently, not public API as I know.
>
> Thanks
> Jerry
>
>
> On Mon, Aug 31, 2015 at 4
I mean the spark builtin rest api
On Mon, Aug 31, 2015 at 3:09 PM, Akhil Das
wrote:
> Check Spark Jobserver
> <https://github.com/spark-jobserver/spark-jobserver>
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 8:54 AM, canan chen wrote:
>
>> I fou
ocally then the `spark.history.fs.logDirectory`
> will happen to point to `spark.eventLog.dir`, but the use case it provides
> is broader than that.
>
> -Andrew
>
> 2015-08-19 5:13 GMT-07:00 canan chen :
>
>> Anyone know about this ? Or do I miss something here ?
>>
>
Anyone know about this ? Or do I miss something here ?
On Fri, Aug 7, 2015 at 4:20 PM, canan chen wrote:
> Is there any reason that historyserver use another property for the event
> log dir ? Thanks
>
gt; http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+module&subj=Building+Spark+Building+just+one+module+
>
>
>
> > On Aug 19, 2015, at 1:44 AM, canan chen wrote:
> >
> > I want to work on one jira, but it is not easy to do unit test, because
> it involves di
I want to work on one jira, but it is not easy to do unit test, because it
involves different components especially UI. spark building is pretty slow,
I don't want to build it each time to test my code change. I am wondering
how other people do ? Is there any experience can share ? Thanks
num-executor only works for yarn mode. In standalone mode, I have to set
the --total-executor-cores and --executor-cores. Isn't this way so
intuitive ? Any reason for that ?
LContext
>
> or just create a new SQLContext from a SparkContext.
>
> -Andrew
>
> 2015-08-15 20:33 GMT-07:00 canan chen :
>
>> I am not sure other people's spark debugging environment ( I mean for the
>> master branch) , Anyone can share his experience ?
>>
I am not sure other people's spark debugging environment ( I mean for the
master branch) , Anyone can share his experience ?
On Sun, Aug 16, 2015 at 10:40 AM, canan chen wrote:
> I import the spark source code to intellij, and want to run SparkPi in
> intellij, but meet the foll
I import the spark source code to intellij, and want to run SparkPi in
intellij, but meet the folliwing weird compilation error? I googled it and
sbt clean doesn't work for me. I am not sure whether anyone else has meet
this issue also, any help is appreciated
Error:scalac:
while compiling:
/
I import the spark project into intellij, and try to run SparkPi in
intellij, but failed with compilation error:
Error:scalac:
while compiling:
/Users/werere/github/spark/sql/core/src/main/scala/org/apache/spark/sql/test/TestSQLContext.scala
during phase: jvm
library version: ver
Anyone know this ? Thanks
On Fri, Aug 7, 2015 at 4:20 PM, canan chen wrote:
> Is there any reason that historyserver use another property for the event
> log dir ? Thanks
>
Is there any reason that historyserver use another property for the event
log dir ? Thanks
t; El miércoles, 29 de julio de 2015, canan chen escribió:
>
>> Anyone know how to set log level in spark-submit ? Thanks
>>
>
Anyone know how to set log level in spark-submit ? Thanks
It works for me by using the following code. Could you share your code ?
*val data =sc.parallelize(List(1,2,3))*
*data.saveAsTextFile("file:Users/chen/Temp/c")*
On Thu, Jul 9, 2015 at 4:05 AM, spok20nn wrote:
> Getting exception when wrting RDD to local disk using following function
>
> s
Lots of places refer RDD lineage, I'd like to know what it refer to
exactly. My understanding is that it means the RDD dependencies and the
intermediate MapOutput info in MapOutputTracker. Correct me if I am wrong.
Thanks
I don't think there is yarn related stuff to access in spark. Spark don't
depend on yarn.
BTW, why do you want the yarn application id ?
On Mon, Jun 22, 2015 at 11:45 PM, roy wrote:
> Hi,
>
> Is there a way to get Yarn application ID inside spark application, when
> running spark Job on YARN
Why do you want it start until all the resources are ready ? Make it start
as early as possible should make it complete earlier and increase the
utilization of resources
On Tue, Jun 23, 2015 at 10:34 PM, Arun Luthra wrote:
> Sometimes if my Hortonworks yarn-enabled cluster is fairly busy, Spark
I don't think this is the correct question. Spark can be deployed on
different cluster manager frameworks like standard alone, yarn & mesos.
Spark can't run without these cluster manager framework, that means spark
depend on cluster manager framework.
And the data management layer is the upstream
One example is that you'd like to set up jdbc connection for each partition
and share this connection across the records.
mapPartitions is much more like the paradigm of mapper in mapreduce. In the
mapper of mapreduce, you have setup method to do any initialization stuff
before processing the spl
Check the available resources you have (cpu cores & memory ) on master web
ui.
The log you see means the job can't get any resources.
On Wed, Jun 24, 2015 at 5:03 AM, Nizan Grauer wrote:
> I'm having 30G per machine
>
> This is the first (and only) job I'm trying to submit. So it's weird that
ast.
>
>
> Best
> Ayan
>
> On Wed, Jun 17, 2015 at 10:21 PM, Mark Tse wrote:
>
>> I think
>> https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
>> might shed some light on the behaviour you’re seeing.
>>
>>
>>
>>
Here's one simple spark example that I call RDD#count 2 times. The first
time it would invoke 2 stages, but the second one only need 1 stage. Seems
the first stage is cached. Is that true ? Any flag can I control whether
the cache the intermediate stage
val data = sc.parallelize(1 to 10, 2).m
Maybe someone has asked this question before. I have this compilation issue
when compiling spark sql. And I found couple of posts on stackoverflow, but
did'nt work for me. Does anyone has experience on this ? thanks
http://stackoverflow.com/questions/26788367/quasiquotes-in-intellij-14
Error:
etween
> these reducers tasks since each shuffle will consume a lot of memory ?
>
> On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov
> wrote:
>
> the link you sent says multiple executors per node
>
> Worker is just demon process launching Executors / JVMs so it can execute
&
executors I want in the code ?
On Tue, May 26, 2015 at 5:57 PM, Arush Kharbanda wrote:
> I believe you would be restricted by the number of cores you have in your
> cluster. Having a worker running without a core is useless.
>
> On Tue, May 26, 2015 at 3:04 PM, canan chen wrote:
&
>
>
> Original message
> From: Arush Kharbanda
> Date:2015/05/26 10:55 (GMT+00:00)
> To: canan chen
> Cc: Evo Eftimov ,user@spark.apache.org
> Subject: Re: How does spark manage the memory of executor with multiple
> tasks
>
> Hi Evo,
>
> Worker is the
n, the number of executor is not fixed, will change
> dynamically according to the load.
>
> Thanks
> Jerry
>
> 2015-05-27 14:44 GMT+08:00 canan chen :
>
>> It seems the executor number is fixed for the standalone mode, not sure
>> other modes.
>>
>
>
It seems the executor number is fixed for the standalone mode, not sure
other modes.
In spark standalone mode, there will be one executor per worker. I am
wondering how many executor can I acquire when I submit app ? Is it greedy
mode (as many as I can acquire )?
ances as there is available in the Executor aka JVM Heap
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:30 AM
> *To:* Evo Eftimov
> *Cc:* user@spark.apache.org
> *Subject:* Re: How does spark manage the memory of executor with multipl
dard
> concepts familiar to every Java, Scala etc developer
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:02 AM
> *To:* user@spark.apache.org
> *Subject:* How does spark manage the memory of executor with multiple
> tasks
>
>
>
Since spark can run multiple tasks in one executor, so I am curious to know
how does spark manage memory across these tasks. Say if one executor takes
1GB memory, then if this executor can run 10 tasks simultaneously, then
each task can consume 100MB on average. Do I understand it correctly ? It
do
43 matches
Mail list logo