Hi
Can some one please take a look at below? Any help is deeply appreciated.
--
Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me
> On Jun 22, 2018, at 13:12, Dhruv Kumar wrote:
>
>
Hello there,
I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload
and got the error (java.lang.AbstractMethodError) never seen before; check
the error stack attached in (a) bellow.
anyone knows if spark 2.3.1 works well with kafka
spark-streaming-kafka-0-10?
this link spar
The SQL plan of each micro-batch in the Spark UI (SQL tab) has links to the
actual Spark jobs that ran in the micro-batch. From that you can drill down
into the stage information. I agree that its not there as a nice per-stream
table as with the Streaming tab, but all the information is present if
The fundamental conceptual difference between the windowing in DStream vs
Structured Streaming is that DStream used the arrival time of the record in
Spark (aka processing time) and Structured Streaming using event time. If
you want to exactly replicate DStream's processing time windows in
Structur
Hi,
Let's say I have the following code (it's an example)
df_a = spark.read.json()
df_b = df_a.sample(False, 0.5, 10)
df_c = df_a.sample(False, 0.5, 10)
df_d = df_b.union(df_c)
df_d.count()
Do we have to cache df_a as it is used by df_b and df_c, or spark will
notice that df_a is used twice in t
Hi all,
I tested this with a Date outside a map and it works fine so I think the
issue is simply for Dates inside Maps. I will create a Jira for this unless
there are objections.
Best regards,
Patrick
On Thu, 28 Jun 2018, 11:53 Patrick McGloin,
wrote:
> Consider the following test, which will
Consider the following test, which will fail on the final show:
* case class *UnitTestCaseClassWithDateInsideMap(map: Map[Date, Int])
test(*"Test a Date as key in a Map"*) {
*val *map = *UnitTestCaseClassWithDateInsideMap*(*Map*(Date.*valueOf*(
*"2018-06-28"*) -> 1))
*val *options = *Map*(*"ti
Hi,
In Structured Streaming lingo, "ReduceByKeyAndWindow" would be a window
aggregation with a composite key.
Something like:
stream.groupBy($"key", window($"timestamp", "5 minutes"))
.agg(sum($"value") as "total")
The aggregate could be any supported SQL function.
Is this what you are
Hi,
I had a few questions regarding the way *newApiHadoopRDD *accesses data
from HBase.
1. Does it load all the data from a scan operation directly in memory?
2. According to my understanding, the data is loaded from different regions
to different executors, is that assumption/understanding corre
I did some further investigation.
If I launch a driver in cluster mode with master IPs like
spark://:7077,:7077, the the driver is launched with
both IPs and -Dspark.master property has both IPs.
But within the logs I see the following, it causes 20 second delay while
launching each driver
18/06/
In Structured Streaming, there's the notion of event-time windowing:
However, this is not quite similar to DStream's windowing operations: in
Structured Streaming, windowing groups the data by fixed time-windows, and
every event in a time window is associated to its group:
And in DStreams it j
Thanks.
A workaround I can think of is to rename/move the objects which have been
processed to a different prefix (which is not monitored), But with
StreamingContext. textFileStream method there doesn't seem to be a way to
know where each record is coming from. Is there another way to do this?
On
12 matches
Mail list logo