date:20170218

Re: Executor tab values in Spark Application UI

2017-02-18 Thread Jacek Laskowski

Hi, Yes, it's the "sum of values for all tasks" (it's based on TaskMetrics which are accumulators behind the scenes). Why "it appears that value isnt much of help while debugging?" ? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/ma

Re: Serialization error - sql UDF related

2017-02-18 Thread Yong Zhang

You define "getNewColumnName" as method, which requires the class/object holding it has to be serializable. >From the stack trace, it looks like this method defined in >ProductDimensionSFFConverterRealApp, but it is not serializable. In fact, your method only uses String and Boolean, which are

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread Yong Zhang

If you only need the group by in the same hierarchy logic, then you can group by at the lowest level, and cache it, then use the cached DF to derive to the higher level, so Spark will only scan the originally table once, and reuse the cache in the following. val df_base = sqlContext.sql("sele

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread ayan guha

Try grouping sets. On Sun, Feb 19, 2017 at 8:23 AM, Patrick wrote: > Hi, > > I have read 5 columns from parquet into data frame. My queries on the > parquet table is of below type: > > val df1 = sqlContext.sql(select col1,col2,count(*) from table groupby > col1,col2) > val df2 = sqlContext.sql(s

Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread Patrick

Hi, I have read 5 columns from parquet into data frame. My queries on the parquet table is of below type: val df1 = sqlContext.sql(select col1,col2,count(*) from table groupby col1,col2) val df2 = sqlContext.sql(select col1,col3,count(*) from table groupby col1,col3) val df3 = sqlContext.sql(sel

Re: Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X

2017-02-18 Thread Stephen Boesch

For now I have added to the log4j.properties: log4j.logger.org.apache.parquet=ERROR 2017-02-18 11:50 GMT-08:00 Stephen Boesch : > The following JIRA mentions that a fix made to read parquet 1.6.2 into 2.X > STILL leaves an "avalanche" of warnings: > > > https://issues.apache.org/jira/browse/SP

Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X

2017-02-18 Thread Stephen Boesch

The following JIRA mentions that a fix made to read parquet 1.6.2 into 2.X STILL leaves an "avalanche" of warnings: https://issues.apache.org/jira/browse/SPARK-17993 Here is the text inside one of the last comments before it was merged: I have built the code from the PR and it indeed succeed

Re: Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-18 Thread Jon Gregg

Spark has partition discovery if your data is laid out in a parquet-friendly directory structure: http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery You can also use wildcards to get subdirectories (I'm using spark 1.6 here) >> data2 = sqlContext.read.load("/my/data

Re: question on SPARK_WORKER_CORES

2017-02-18 Thread Yan Facai

Hi, kodali. SPARK_WORKER_CORES is designed for cluster resource manager, see http://spark.apache.org/docs/latest/cluster-overview.html if interested. For standalone mode, you should use the following 3 arguments to allocate resource for normal spark tasks: - --executor-memory - --executor-

Class Cast Exception while read from GS and write to S3.I feel gettng while writeing to s3.

2017-02-18 Thread Manohar753

Hi All, able to run my simple spark job Read and write to S3 in local ,when i move to cluster gettng below cast exception.Spark Environment a using 2.0.1. please help out if any has faced this kind of issue already. 02/18 10:35:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-172-31

Re: Executor tab values in Spark Application UI

Re: Serialization error - sql UDF related

Re: Efficient Spark-Sql queries when only nth Column changes

Re: Efficient Spark-Sql queries when only nth Column changes

Efficient Spark-Sql queries when only nth Column changes

Re: Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X

Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X

Re: Query data in subdirectories in Hive Partitions using Spark SQL

Re: question on SPARK_WORKER_CORES

Class Cast Exception while read from GS and write to S3.I feel gettng while writeing to s3.

10 matches

Site Navigation

Mail list logo

Footer information