from:"JayeshLalwani"

Re: Synonym handling replacement issue with UDF in Apache Spark

2017-05-03 Thread JayeshLalwani

You need to understand how join works to make sense of it. Logically, a join does a cartesian product of the 2 tables, and then filters the rows that satisfy the contains UDF. So, let's say you have Input Allen Armstrong nishanth hemanth Allen shivu Armstrong nishanth shree shivu DeWALT Replacem

Re: Spark-SQL collect function

2017-05-03 Thread JayeshLalwani

In any distributed application, you scale up by splitting execution up on multiple machines. The way Spark does this is by slicing the data into partitions and spreading them on multiple machines. Logically, an RDD is exactly that: data is split up and spread around on multiple machines. When you p

Re: In-order processing using spark streaming

2017-05-03 Thread JayeshLalwani

Option A If you can get all the messages in a session into the same Spark partition, you can use df.mapWithPartition to process the whole partition. This will allow you to control the order in which the messages are processed within the partition. This will work if messages are posted in Kafka in

Refreshing a persisted RDD

2017-05-03 Thread JayeshLalwani

We have a Structured Streaming application that gets accounts from Kafka into a streaming data frame. We have a blacklist of accounts stored in S3 and we want to filter out all the accounts that are blacklisted. So, we are loading the blacklisted accounts into a batch data frame and joining it with

Structured Streaming: How to handle bad input

2017-02-23 Thread JayeshLalwani

What is a good way to make a Structured Streaming application deal with bad input? Right now, the problem is that bad input kills the Structured Streaming application. This is highly undesirable, because a Structured Streaming application has to be always on For example, here is a very simple stru

Re: Synonym handling replacement issue with UDF in Apache Spark

Re: Spark-SQL collect function

Re: In-order processing using spark streaming

Refreshing a persisted RDD

Structured Streaming: How to handle bad input

5 matches

Site Navigation

Mail list logo

Footer information