Could you check the the spark app's yarn log and livy log ?
Chetan Khatri 于2018年5月10日周四 上午4:18写道:
> All,
>
> I am running on Hortonworks HDP Hadoop with Livy and Spark 2.2.0, when I
> am running same spark job using spark-submit it is getting success with all
> transformations are done.
>
> When
It supports python 3.5, and IIRC, spark also support python 3.6
Irving Duran 于2018年5月10日周四 下午9:08写道:
> Does spark now support python 3.5 or it is just 3.4.x?
>
> https://spark.apache.org/docs/latest/rdd-programming-guide.html
>
> Thank You,
>
> Irving Duran
>
I don't think it is possible to have less than 1 core for AM, this is due
to yarn not spark.
The number of AM comparing to the number of executors should be small and
acceptable. If you do want to save more resources, I would suggest you to
use yarn cluster mode where driver and AM run in the same
Hi there,
from my apache spark streaming website (see links below),
- the batch-interval is set when a spark StreamingContext is constructed
(see example (a) quoted below)
- the StreamingContext is available in older and new Spark version
(v1.6, v2.2 to v2.3.0) (see
https://spark.
My data looks like this:
{
"ts2" : "2018/05/01 00:02:50.041",
"serviceGroupId" : "123",
"userId" : "avv-0",
"stream" : "",
"lastUserActivity" : "00:02:50",
"lastUserActivityCount" : "0"
}
{
"ts2" : "2018/05/01 00:09:02.079",
"serviceGroupId" : "123",
"userId" : "avv-0",
"strea
Yeah, it depends on what you want to do with that timeseries data. We at
Datadog process trillions of points daily using Spark, I cannot really go
about what exactly we do with the data, but just saying that Spark can
handle the volume, scale well and be fault-tolerant, albeit everything I
said com
Hi,
I have a spark streaming application running on yarn that consumes from a jms
source. I have the checkpointing and WAL enabled to ensure zero data loss.
However, When I suddenly kill my application and restarts it, sometimes it
recovers the data from the WAL but sometimes it doesn’t !! In a
Dear all,
I am fitting a very trivial GMM with 2-10 components on 100 samples and
5 features in pyspark and observe some of the log-likelihoods being
positive (see below). I don't undestand how this is possible. Is this a
bug or an intended behaviour? Furthermore, for different seeds,
sometim
There is not one answer to this.
It really depends what kind of time Series analysis you do with the data and
what time series database you are using. Then it also depends what Etl you need
to do.
You seem to also need to join data - is it with existing data of the same type
or do you join com