date:20160626

Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Amit Sela

Sometimes, the BUF for the aggregator may depend on the actual input.. and while this passes the responsibility to handle null in merge/reduce to the developer, it sounds fine to me if he is the one who put null in zero() anyway. Now, it seems that the aggregation is skipped entirely when zero() =

Running of Continuous Aggregation example

2016-06-26 Thread Chang Lim

Has anyone been able to run the code in The Future of Real-Time in Spark Slide 24 :"Continuous Aggregation"? Specifically, the line: stream("jdbc:mysql//..."), Using Spark 2.0 preview build, I am getting the error when writi

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Takeshi Yamamuro

Hi, This behaviour seems to be expected because you must ensure `b + zero() = b` The your case `b + null = null` breaks this rule. This is the same with v1.6.1. See: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57 // maropu

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Amit Sela

Not sure about what's the rule in case of `b + null = null` but the same code works perfectly in 1.6.1, just tried it.. On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro wrote: > Hi, > > This behaviour seems to be expected because you must ensure `b + zero() = > b` > The your case `b + null = nul

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Takeshi Yamamuro

Whatever it is, this is expected; if an initial value is null, spark codegen removes all the aggregates. See: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L199 // maropu On Sun, Jun 26, 2016 at 7:46 PM, Amit Sela

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Amit Sela

This "if (value == null)" condition you point to exists in 1.6 branch as well, so that's probably not the reason. On Sun, Jun 26, 2016 at 1:53 PM Takeshi Yamamuro wrote: > Whatever it is, this is expected; if an initial value is null, spark > codegen removes all the aggregates. > See: > https://

RE: Logging trait in Spark 2.0

2016-06-26 Thread Paolo Patierno

Yes ... the same here ... I'd like to know the best way for adding logging in a custom receiver for Spark Streaming 2.0 Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor Twitter : @ppatierno Linkedin : paolopatierno Blog : DevEx

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Takeshi Yamamuro

No, TypedAggregateExpression that uses Aggregator#zero is different between v2.0 and v1.6. v2.0: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala#L91 v1.6: https://github.com/apache/spark/blob/branch-1.6/sql/

add multiple columns

2016-06-26 Thread pseudo oduesp

Hi who i can add multiple columns to data frame withcolumns allow to add one columns but when you have multiple i have to loop on eache columns ? thanks

alter table with hive context

2016-06-26 Thread pseudo oduesp

Hi, how i can alter table by adiing new columns to table in hivecontext ?

Re: add multiple columns

2016-06-26 Thread ndjido

Hi guy! I'm afraid you have to loop...The update of the Logical Plan is getting faster on Spark. Cheers, Ardo. Sent from my iPhone > On 26 Jun 2016, at 14:20, pseudo oduesp wrote: > > Hi who i can add multiple columns to data frame > > withcolumns allow to add one columns but when you h

Re: add multiple columns

2016-06-26 Thread ayan guha

Can you share an example? You may want to write a sql stmt to add the columns>? On Sun, Jun 26, 2016 at 11:02 PM, wrote: > Hi guy! > > I'm afraid you have to loop...The update of the Logical Plan is getting > faster on Spark. > > Cheers, > Ardo. > > Sent from my iPhone > > > On 26 Jun 2016, at 1

Re: alter table with hive context

2016-06-26 Thread Mich Talebzadeh

-- create the hivecontext scala> *val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*HiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6387fb09 --use the test dastabase scala> *HiveContext.sql("use test")*res8: org.apache.spark.sql.DataFrame

Running JavaBased Implementation of StreamingKmeans Spark

2016-06-26 Thread Biplob Biswas

Hi, Something is wrong with my spark subscription so I can't see the responses properly on nabble, so I subscribed from a different id, hopefully it is solved and I am putting my question again here. I implemented the streamingKmeans example provided in the spark website but in Java. The full imp

What is the explanation of "ConvertToUnsafe" in "Physical Plan"

2016-06-26 Thread Mich Talebzadeh

Hi, In Spark's Physical Plan what is the explanation for ConvertToUnsafe? Example: scala> sorted.filter($"prod_id" ===13).explain == Physical Plan == Filter (prod_id#10L = 13) +- Sort [prod_id#10L ASC,cust_id#11L ASC,time_id#12 ASC,channel_id#13L ASC,promo_id#14L ASC], true, 0 +- ConvertToUns

Spark 1.6.1: Unexpected partition behavior?

2016-06-26 Thread Randy Gelhausen

val enriched_web_logs = sqlContext.sql(""" select web_logs.datetime, web_logs.node as app_host, source_ip, b.node as source_host, log from web_logs left outer join (select distinct node, address from nodes) b on source_ip = address """) enriched_web_logs.coalesce(1).write.format("parquet").mode("o

Re: Spark 1.6.1: Unexpected partition behavior?

2016-06-26 Thread Randy Gelhausen

Sorry, please ignore the above. I now see I called coalesce on a different reference, than I used to register the table. On Sun, Jun 26, 2016 at 6:34 PM, Randy Gelhausen wrote: > > val enriched_web_logs = sqlContext.sql(""" > select web_logs.datetime, web_logs.node as app_host, source_ip, b.no

Re: Spark 2.0 Streaming and Event Time

2016-06-26 Thread Chang Lim

Here is an update to my question: = Tathagata Das Jun 9 to me Event time is part of windowed aggregation API. See my slides - https://www.slideshare.net/mobile/databricks/a-deep-dive-into-structured-streaming Let me know if it helps you to find it. Keeping it short as I am o

Re: problem running spark with yarn-client not using spark-submit

2016-06-26 Thread sychungd

Hi, Thanks for reply. It's a java web service resides in a jboss container. HY Chung Best regards, S.Y. Chung 鍾學毅 F14MITD Taiwan Semiconductor Manufacturing Company, Ltd. Tel: 06-5056688 Ext: 734-6325 |-> |Mich Talebzadeh | |

Re: problem running spark with yarn-client not using spark-submit

2016-06-26 Thread Saisai Shao

It means several jars are missing in the yarn container environment, if you want to submit your application through some other ways besides spark-submit, you have to take care all the environment things yourself. Since we don't know your implementation of java web service, so it is hard to provide

How to convert a Random Forest model built in R to a similar model in Spark

2016-06-26 Thread Neha Mehta

Hi All, Request help with problem mentioned in the mail below. I have an existing random forest model in R which needs to be deployed on Spark. I am trying to recreate the model in Spark but facing the problem mentioned below. Thanks, Neha On Jun 24, 2016 5:10 PM, wrote: > > Hi Sun, > > I am try

Unsubscribe

2016-06-26 Thread kchen

Unsubscribe

Difference between Dataframe and RDD Persisting

2016-06-26 Thread Brandon White

What is the difference between persisting a dataframe and a rdd? When I persist my RDD, the UI says it takes 50G or more of memory. When I persist my dataframe, the UI says it takes 9G or less of memory. Does the dataframe not persist the actual content? Is it better / faster to persist a RDD when

Re: Spark Thrift Server Concurrency

2016-06-26 Thread Prabhu Joseph

Spark Thrift Server is started with ./sbin/start-thriftserver.sh --master yarn-client --hiveconf hive.server2.thrift.port=10001 --num-executors 4 --executor-cores 2 --executor-memory 4G --conf spark.scheduler.mode=FAIR 20 parallel below queries are executed select distinct val2 from philips1 whe

Re: Difference between Dataframe and RDD Persisting

2016-06-26 Thread Jörn Franke

Dataframe uses a more efficient binary representation to store and persist data. You should go for that one in most of the cases. Rdd is slower. > On 27 Jun 2016, at 07:54, Brandon White wrote: > > What is the difference between persisting a dataframe and a rdd? When I > persist my RDD, the UI

Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Running of Continuous Aggregation example

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

RE: Logging trait in Spark 2.0

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

add multiple columns

alter table with hive context

Re: add multiple columns

Re: add multiple columns

Re: alter table with hive context

Running JavaBased Implementation of StreamingKmeans Spark

What is the explanation of "ConvertToUnsafe" in "Physical Plan"

Spark 1.6.1: Unexpected partition behavior?

Re: Spark 1.6.1: Unexpected partition behavior?

Re: Spark 2.0 Streaming and Event Time

Re: problem running spark with yarn-client not using spark-submit

Re: problem running spark with yarn-client not using spark-submit

How to convert a Random Forest model built in R to a similar model in Spark

Unsubscribe

Difference between Dataframe and RDD Persisting

Re: Spark Thrift Server Concurrency

Re: Difference between Dataframe and RDD Persisting

25 matches

Site Navigation

Mail list logo

Footer information