from:"Brian Wylie"

Re: Graceful shutdown SPARK Structured Streaming

2023-02-08 Thread Brian Wylie

It's been a few years (so this approach might be out of date) but here's what I used for PySpark as part of this SO ( https://stackoverflow.com/questions/45717433/stop-structured-streaming-query-gracefully/65708677 ) ``` # Helper method to stop a streaming query def stop_stream_query(query, wait_

Re: [Spark Structured Streaming] Processing the data path coming from kafka.

2021-01-18 Thread Brian Wylie

Coming in late.. but if I understand correctly, you can simply use the fact that spark.read (or readStream) will also accept a directory argument. If you provide a directory spark will automagically pull in all the files in that directory. """Reading in multiple files example""" spark = SparkSess

pyspark histogram

2017-09-27 Thread Brian Wylie

Hi All, My google/SO searching is somehow failing on this I simply want to compute histograms for a column in a Spark dataframe. There are two SO hits on this question: - https://stackoverflow.com/questions/39154325/pyspark-show-histogram-of-a-data-frame-column - https://stackoverflow.com/questio

RE: plotting/resampling timeseries data

2017-09-22 Thread Brian Wylie

@vermanuraq Great thanks, just what I needed.. I knew I was missing something simple. Cheers, -brian -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.or

Re: Python UDF to convert timestamps (performance question)

2017-08-30 Thread Brian Wylie

ot;ts").cast("timestamp")* > > On Wed, Aug 30, 2017 at 11:45 AM, Brian Wylie > wrote: > >> Hi All, >> >> I'm using structured streaming in Spark 2.2. >> >> I'm using PySpark and I have data (from a Kafka publisher) where the >>

Python UDF to convert timestamps (performance question)

2017-08-30 Thread Brian Wylie

# Then a writestream later... Okay so all this code works fine (the 'dt' field has exactly what I want)... but I'll be streaming in a lot of data so here's the questions: - Will the creation of a new dataframe withColumn basically kill performance? - Should I move my UDF into the parsed_data.select(...) part? - Can my UDF be done by spark.sql directly? (I tried to_timestamp but without luck) Any suggestions/pointers are greatly appreciated. -Brian Wylie

Re: PySpark, Structured Streaming and Kafka

2017-08-24 Thread Brian Wylie

options. Cheers and thanks again. -Brian On Wed, Aug 23, 2017 at 4:51 PM, Shixiong(Ryan) Zhu wrote: > You can use `bin/pyspark --packages > org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0` > to start "pyspark". If you want to use "spark-submit", you also need to &g

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie

Aug 23, 2017 at 4:51 PM, Shixiong(Ryan) Zhu wrote: > You can use `bin/pyspark --packages > org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0` > to start "pyspark". If you want to use "spark-submit", you also need to > provide your Python file. > > On Wed, Au

PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie

t;main" java.lang.IllegalArgumentException: Missing application resource. at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:274) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) at org.apache.spark.launcher.Main.main(Main.java:86) Anyway, all my code/versions/etc are in this notebook: - https://github.com/Kitware/BroThon/blob/master/notebooks/Bro_to_Spark.ipynb I'd be tremendously appreciative of some super nice, smart person if they could point me in the right direction :) -Brian Wylie

Re: Question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie

t to > read bro logs, rather than a python library. This is likely to have much > better performance since we can do all of the parsing on the JVM without > having to flow it though an external python process. > > On Tue, Aug 8, 2017 at 9:35 AM, Brian Wylie > wrote: > >

Fwd: Python question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie

Hi All, I've read the new information about Structured Streaming in Spark, looks super great. Resources that I've looked at - https://spark.apache.org/docs/latest/streaming-programming-guide.html - https://databricks.com/blog/2016/07/28/structured-streamin g-in-apache-spark.html - https://spark.a

Question about 'Structured Streaming'

2017-08-08 Thread Brian Wylie

Hi All, I've read the new information about Structured Streaming in Spark, looks super great. Resources that I've looked at - https://spark.apache.org/docs/latest/streaming-programming-guide.html - https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html - https://spark.ap

Re: Graceful shutdown SPARK Structured Streaming

Re: [Spark Structured Streaming] Processing the data path coming from kafka.

pyspark histogram

RE: plotting/resampling timeseries data

Re: Python UDF to convert timestamps (performance question)

Python UDF to convert timestamps (performance question)

Re: PySpark, Structured Streaming and Kafka

Re: PySpark, Structured Streaming and Kafka

PySpark, Structured Streaming and Kafka

Re: Question about 'Structured Streaming'

Fwd: Python question about 'Structured Streaming'

Question about 'Structured Streaming'

12 matches

Site Navigation

Mail list logo

Footer information