Re: Structured Streaming & Enrichment Broadcasts

2019-11-18 Thread Burak Yavuz
If you store the data that you're going to broadcast as a Delta table (see delta.io) and perform a stream-batch (where your Delta table is the batch) join, it will auto-update once the table receives any updates. Best, Burak On Mon, Nov 18, 2019, 6:21 AM Bryan Jeffrey wrote: > Hello. > > We're

Re: SparkR integration with Hive 3 spark-r

2019-11-18 Thread Alfredo Marquez
Hello Nicolas, Well the issue is that with Hive 3, Spark gets it's own metastore, separate from the Hive 3 metastore. So how do you reconcile this separation of metastores? Can you continue to "enableHivemetastore" and be able to connect to Hive 3? Does this connection take advantage of Hive's L

Re: SparkR integration with Hive 3 spark-r

2019-11-18 Thread Nicolas Paris
Hi Alfredo my 2 cents: To my knowlegde and reading the spark3 pre-release note, it will handle hive metastore 2.3.5 - no mention of hive 3 metastore. I made several tests on this in the past[1] and it seems to handle any hive metastore version. However spark cannot read hive managed table AKA tra

SparkR integration with Hive 3 spark-r

2019-11-18 Thread Alfredo Marquez
Hello, Our company is moving to Hive 3, and they are saying that there is no SparkR implementation in Spark 2.3.x + that will connect to Hive 3. Is this true? If it is true, will this be addressed in the Spark 3 release? I don't use python, so losing SparkR to get work done on Hadoop is a huge

Performance of PySpark 2.3.2 on Microsoft Windows

2019-11-18 Thread Wim Van Leuven
Hello, we are writing a lot of data processing pipelines for Spark using pyspark and add a lot of integration tests. In our enterprise environment, a lot of people are running Windows PCs and we notice that build times are really slow on Windows because of the integration tests. These metrics are

Structured Streaming & Enrichment Broadcasts

2019-11-18 Thread Bryan Jeffrey
Hello. We're running applications using Spark Streaming. We're going to begin work to move to using Structured Streaming. One of our key scenarios is to lookup values from an external data source for each record in an incoming stream. In Spark Streaming we currently read the external data, broa