Hi,
The spark job where we save a dataset to a give table using
exampledataset.saveAsTable() fails with the following error after upgrading
from spark 2.4 to spark 3.0
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
write incompatible data to table '':
- Cannot write nu
Hi ,
We have a use case where one record needs to be in two different
aggregations.
Say for example a credit card transaction "A", which belongs to
transaction category ATM and crossborder.
If I need to take the count of ATM transaction, I need to consider
transaction A . For count of cro
Hello,
We got two datasets thats been persisted as follows:
Dataset A:
datasetA.repartition(5,datasetA.col("region"))
.write().mode(saveMode)
.format("parquet")
.partitionBy("region")
.bucketBy(5,"studentId")
.sortBy
Hello,
I have a requirement where I need to get total count of rows and total
count of failedRows based on a grouping.
The code looks like below:
myDataset.createOrReplaceTempView("temp_view");
Dataset countDataset = sparkSession.sql("Select
column1,column2,column3,column4,column5,column6,co
I don't think inner join will solve my problem.
*For each row in* paramsDataset, I need to filter mydataset. And then I need
to run a bunch of calculation on filtered myDataset.
Say for example paramsDataset has three employee age ranges . Eg:
20-30,30-50, 50-60 and regions USA,Canada.
myDataset
Hi,
I have one dataset with parameters and another with data that needs to
apply/ filter based on the first dataset (Parameter dataset).
*Scenario is as follows:*
For each row in parameter dataset, I need to apply the parameter row to
the second dataset.I will end up having multiple datase
Thanks for the response JayeshLalwani. Clearly in my case the issue was with
my approach, not with the memory.
The job was taking much longer time even for smaller dataset.
Thanks again!
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
I was able to solve it by writing a java method (to slice and dice data) and
invoking the method/function from spark.map. This transformed the data way
faster than my previous approach.
Thanks geoHeil for the pointer.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
Hello,
We are running into issues while trying to process fixed length files using
spark.
The approach we took is as follows:
1. Read the .bz2 file into a dataset from hdfs using
spark.read().textFile() API.Create a temporary view.
Dataset rawDataset = sparkSession.read().textFile(filePa