Hi,
Yes, it's the "sum of values for all tasks" (it's based on TaskMetrics
which are accumulators behind the scenes).
Why "it appears that value isnt much of help while debugging?" ?
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/ma
You define "getNewColumnName" as method, which requires the class/object
holding it has to be serializable.
>From the stack trace, it looks like this method defined in
>ProductDimensionSFFConverterRealApp, but it is not serializable.
In fact, your method only uses String and Boolean, which are
If you only need the group by in the same hierarchy logic, then you can group
by at the lowest level, and cache it, then use the cached DF to derive to the
higher level, so Spark will only scan the originally table once, and reuse the
cache in the following.
val df_base = sqlContext.sql("sele
Try grouping sets.
On Sun, Feb 19, 2017 at 8:23 AM, Patrick wrote:
> Hi,
>
> I have read 5 columns from parquet into data frame. My queries on the
> parquet table is of below type:
>
> val df1 = sqlContext.sql(select col1,col2,count(*) from table groupby
> col1,col2)
> val df2 = sqlContext.sql(s
Hi,
I have read 5 columns from parquet into data frame. My queries on the
parquet table is of below type:
val df1 = sqlContext.sql(select col1,col2,count(*) from table groupby
col1,col2)
val df2 = sqlContext.sql(select col1,col3,count(*) from table groupby
col1,col3)
val df3 = sqlContext.sql(sel
For now I have added to the log4j.properties:
log4j.logger.org.apache.parquet=ERROR
2017-02-18 11:50 GMT-08:00 Stephen Boesch :
> The following JIRA mentions that a fix made to read parquet 1.6.2 into 2.X
> STILL leaves an "avalanche" of warnings:
>
>
> https://issues.apache.org/jira/browse/SP
The following JIRA mentions that a fix made to read parquet 1.6.2 into
2.X STILL leaves an "avalanche" of warnings:
https://issues.apache.org/jira/browse/SPARK-17993
Here is the text inside one of the last comments before it was merged:
I have built the code from the PR and it indeed succeed
Spark has partition discovery if your data is laid out in a
parquet-friendly directory structure:
http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
You can also use wildcards to get subdirectories (I'm using spark 1.6 here)
>>
data2 = sqlContext.read.load("/my/data
Hi, kodali.
SPARK_WORKER_CORES is designed for cluster resource manager, see
http://spark.apache.org/docs/latest/cluster-overview.html if interested.
For standalone mode,
you should use the following 3 arguments to allocate resource for normal
spark tasks:
- --executor-memory
- --executor-
Hi All,
able to run my simple spark job Read and write to S3 in local ,when i move
to cluster gettng below cast exception.Spark Environment a using 2.0.1.
please help out if any has faced this kind of issue already.
02/18 10:35:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
ip-172-31
10 matches
Mail list logo