On 7 Jul 2017, at 08:37, Esa Heikkinen
mailto:esa.heikki...@student.tut.fi>> wrote:
I only want to simulate very huge "network" with even millions parallel time
syncronized actors (state machines). There are also communication between
actors via some (key-value pairs) database. I also want th
lto:jornfra...@gmail.com>>
Lähetetty: 20. kesäkuuta 2017 17:12
Vastaanottaja: Esa Heikkinen
Kopio: user@spark.apache.org<mailto:user@spark.apache.org>
Aihe: Re: Using Spark as a simulator
It is fine, but you have to design it that generated rows are written in large
blocks for optimal performan
I have already seen on example where data is generated using spark, no reason
to think it's a bad idea as far as I know.
You can check the code here, I m not very sure but I think there is something
there which generates data for the TPCDS benchmark and you can provide how much
data you want in
It is fine, but you have to design it that generated rows are written in large
blocks for optimal performance.
The most tricky part with data generation is the conceptual part, such as
probabilistic distribution etc
You have to check as well that you use a good random generator, for some cases