For example, I have some data with timstamp marked as category A and B, and
ordered by time. Now I want to calculate each duration from A to B. In
normal program, I can use the flag bit to record the preview data if it is
A or B, and then calculate the duration. But in Spark Dataframe, how to do
i
Hi :
I have some questions about spark structured streaming window output in
spark 2.3.1. I write the application code as following:
case class DataType(time:Timestamp, value:Long) {}
val spark = SparkSession
.builder
.appName("StructuredNetworkWordCount")
.master("local[1
You didn't specify which API, but in pyspark you could do
import pyspark.sql.functions as F
df.groupBy('ID').agg(F.sort_array(F.collect_set('DETAILS')).alias('DETAILS')).show()
+---++
| ID| DETAILS|
+---++
| 1|[A1, A2, A3]|
| 3|[B2]|
| 2|[B1]|
+---+
How do you do it now?
You could use a withColumn(“newDetails”, )
jg
> On Aug 22, 2018, at 16:04, msbreuer wrote:
>
> A dataframe with following contents is given:
>
> ID PART DETAILS
> 11 A1
> 12 A2
> 13 A3
> 21 B1
> 31 C1
>
> Target format should be as following:
>
>
A dataframe with following contents is given:
ID PART DETAILS
11 A1
12 A2
13 A3
21 B1
31 C1
Target format should be as following:
ID DETAILS
1 A1+A2+A3
2 B1
3 C1
Note, the order of A1-3 is important.
Currently I am using this alternative:
ID DETAIL_1 DETAIL_2 DETAI
Hi,
that was just one of the options, and not the first one, is there any
chance of trying out the other options mentioned? For example, pointing the
shuffle storage area to a location with larger space?
Regards,
Gourav Sengupta
On Wed, Aug 22, 2018 at 11:15 AM Vitaliy Pisarev <
vitaliy.pisa...@
Documentation says that 'spark.shuffle.memoryFraction' was deprecated, but
it doesn't say what to use instead. Any idea?
On Wed, Aug 22, 2018 at 9:36 AM, Gourav Sengupta
wrote:
> Hi,
>
> The best part about Spark is that it is showing you which configuration to
> tweak as well. In case you are u
Hi,
When I am doing calculations for example 700 listID's it is saving only some 50
rows and then getting some random exceptions
Getting below exception when I try to do calculations on huge data and try to
save huge data . Please let me know if any suggestions.
Sample Code :
I have some