I am trying to replace groupByKey() with reudceByKey(), I am a pyspark and
python newbie and I am having a hard time figuring out the lambda function
for the reduceByKey() operation.
Here is the code
dd = hive_context.read.orc(orcfile_dir).rdd.map(lambda x:
(x[0],x)).groupByKey(25).take(2)
Here
I have some code that adds a column that contains a row_number over a
window. It looks somewhat like this
val sortColumns: List[Column] = r.sortFields.map(sf =>
sf.map(col(_))).getOrElse(List(col(s"defaultSortCol")))
val partitionWindow = Window.partitionBy(s"groupByCol")
val window = partitionWin
I keep unsubscribing from this list; but continue to receive emails.
On Fri, Aug 3, 2018 at 4:01 AM Christiaan Ras
wrote:
> Hi,
>
>
>
> I have a use case where I like to analyze windows of sensordata.
>
> Currently I have a working case where I use Structured Streaming to
> process real-time str
Hi
I'm wondering how does readStream() and writeStream() work internally
Lets take a simple example:
df = spark.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", kafka_brokers) \
.option("subscribe", kafka_topic) \
.l
Hi,
I have a use case where I like to analyze windows of sensordata.
Currently I have a working case where I use Structured Streaming to process
real-time streams of sensordata. Now I like to analyse windows of sensordata
and use classification to predict the class of a whole window.
For instanc