gards,
Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
On Tue, Feb 25, 2020 at 7:46 PM Ruijing Li wrote:
>
> Just wanted to follow up on this. If anyone has any advice, I’d be interested
> in learning more!
>
> On
subject. Slides and video
are linked on this page: http://www.mapflat.com/presentations/
You can find more material in this list of resources:
http://www.mapflat.com/lands/resources/reading-list
Happy testing!
Regards,
Lars Albertsson
Data engineering entrepreneur
www.mimeria.com
in this list of resources:
http://www.mapflat.com/lands/resources/reading-list
Happy testing!
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar
On Mon, May 21, 2018 at 2:24 PM, Steve
. Validate
selected fields instead.
For a longer answer, please search for my previous posts to the user
list, or watch this presentation: https://vimeo.com/192429554
Slides at
http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458
Regards,
Lars Albertsson
Data
do you want to use DI for other reasons?
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS, https://freebusy.io/la...@mapflat.com
On Fri, Dec 23, 2016 at 11:56 AM, Chetan Khatri
wrote:
> Hello Commun
subject. There is a video
recording at https://vimeo.com/192429554 and slides at
http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458
You can find more material on test strategies at
http://www.mapflat.com/lands/resources/reading-list/index.html
Lars Albertsson
should be
addressed; if you induce failures, system failures would become part
of normal operations, and real failures risk passing unnoticed.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS
On T
You can find useful discussions in the list archives. I wrote this, which
might help you:
https://www.mail-archive.com/user%40spark.apache.org/msg48032.html
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
Calendar: https://goo.gl/tV2hWF
On Jun 29, 2016 07:02
ading-list/
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS
On Wed, Jul 20, 2016 at 3:47 PM, Sathish Kumaran Vairavelu
wrote:
> If you are using Mesos, then u can use Chronos or Marathon
>
> On
You can use a workflow manager, which gives you tools to handle transient
failures in data pipelines. I suggest either Luigi or Airflow. They provide
DSLs embedded in Python, so if the primitives provided are insufficient, it
is easy to customise Spark tasks with restart logic.
Regards,
Lars
egration
test setup that runs smoothly from Gradle/Maven/SBT and also from
IntelliJ.
I hope things are clearer. Let me know if you have further questions.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
Calendar: https://goo.gl/6FBtlS
On Thu, Jul 7, 20
grated with Docker Compose. If you are emitting
database entries, your test oracle will need to frequently poll the
database for the expected records, with a timeout in order not to hang
on failing tests.
I hope this is comprehensible. Let me know if you have followup questions.
Regards,
La
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On May 18, 2016 20:14, "swetha kasireddy" wrote:
> Hi Lars,
>
> Do you have any examples for the methods that you described for Spark
> batch and Streaming?
>
> Thanks!
>
> On Wed, Mar 30, 20
Hi,
I wrote a longish mail on Spark testing strategy last month, which you
may find useful:
http://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/browser
Let me know if you have follow up questions or want assistance.
Regards,
Lars Albertsson
Data engineering consultant
Thanks!
It is on my backlog to write a couple of blog posts on the topic, and
eventually some example code, but I am currently busy with clients.
Thanks for the pointer to Eventually - I was unaware. Fast exit on
exception would be a useful addition, indeed.
Lars Albertsson
Data engineering
ented with some expiration
strategy. :-)
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Fri, Mar 25, 2016 at 7:48 AM, Jatin Kumar wrote:
> Hello Lars,
>
> Thanks for your email. I tried exactly what you said and it doesn't perform
> good due
case, the data structures ended up being small, on the
order ot tens or hundreds of megabytes. It varies with use case, but
it is probably a path worth investigating if approximate results are
acceptable.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Wed
different combinations of
time windows by pushing out CMSs and heavy hitters to e.g. Kafka, and
have different stream processors that aggregate different time windows
and push results to Kafka or to lookup tables.
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Tue, Mar
y Ted Dunning and Mikio Braun,
who have held good presentations on the subject.
There are AFAIK two open source implementations of Count-Min Sketch,
one of them in Algebird.
Let me know if anything is unclear.
Good luck, and let us know how it goes.
Regards,
Lars Albertsson
Data engineeri
f you want
clarifications or assistance.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Wed, Mar 2, 2016 at 6:54 PM, SRK wrote:
> Hi,
>
> What is a good unit testing framework for Spark batch/streaming jobs? I have
> core spark, spark sql with datafr
you start and
stop the fixture once for each test class, rather than once per test
method, you save a lot of time.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Fri, Feb 26, 2016 at 8:24 AM, Mao, Wei wrote:
> I would argue against making
.
Let me know if you have follow-up questions, or want assistance.
Regards,
Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109
On Tue, Jan 26, 2016 at 10:25 PM, Daniel Schulz
wrote:
> Hi,
>
> We are currently working on a solution architecture to solve IoT workl
the results with the list!
Regards,
Lars Albertsson
On Thu, Oct 22, 2015 at 10:48 PM, Nipun Arora wrote:
> Hi,
> In general in spark stream one can do transformations ( filter, map etc.) or
> output operations (collect, forEach) etc. in an event-driven pardigm... i.e.
> the action
ling? That will require
additional components.
This became a bit of a brain dump on the topic. I hope that it is
useful. Don't hesitate to get back if I can help.
Regards,
Lars Albertsson
On Fri, Aug 7, 2015 at 5:43 PM, Vikram Kone wrote:
> Hi,
> I'm looking for open source wo
The snippet at the end worked for me. We run Spark 1.3.x, so
DataFrame.drop is not available to us.
As pointed out by Yana, DataFrame operations typically return a new
DataFrame, so use as such:
import com.foo.sparkstuff.DataFrameOps._
...
val df = ...
val prunedDf = df.dropColumns("one_col",
25 matches
Mail list logo