Hi,
below sounds like something that someone will have experienced...
I have external tables of parquet files with a hive table defined on top of the
data. I dont manage/know the details of how the data lands.
For some tables no issues when querying through spark.
But for others there is an issue:
So now I have tried to run this function in a ThreadPool. But it doesn't
seem to work.
[image: image.png]
-- Forwarded message -
Fra: Sean Owen
Date: ons. 20. jul. 2022 kl. 22:43
Subject: Re: Pyspark and multiprocessing
To: Bjørn Jørgensen
I don't think you ever say what doesn
I have 400k of JSON files. Which is between 10 kb and 500 kb in size.
They don`t have the same schema, so I have to loop over them one at a time.
This works, but is`s very slow. This process takes 5 days!
So now I have tried to run this functions in a ThreadPool. But it don`t
seems to work.
*St
Hi Users,
Pasting full stack trace, Could anyone pls suggest
Main error toward end is : NoClassDefFoundError
Attaching full trace.
Lost task 31.0 in stage 6.0 (TID 235, 10.139.64.16, executor 9):
java.lang.ExceptionInInitializerError
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
How different? I think quite small variations are to be expected.
On Wed, Jul 20, 2022 at 9:13 AM Roger Wechsler wrote:
> Hi!
>
> We've been using Spark 3.0.1 to train Logistic regression models
> with MLLIb.
> We've recently upgraded to Spark 3.3.0 without making any other code
> changes and no
Hi!
We've been using Spark 3.0.1 to train Logistic regression models with MLLIb.
We've recently upgraded to Spark 3.3.0 without making any other code
changes and noticed that the trained models are different as compared to
the ones trained with 3.0.1 and therefore behave differently when used for
The data transformation is all the same.
Sure, linear regression is easy:
https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression
These are components that operate on DataFrames.
You'll want to look at VectorAssembler to prepare data into an array column.
There are
Hello , I am using maven with Spark. Post upgrading scala form 2.11 to 2.12
I am getting below error and have observed this coming while reading avro
Appreciate help.
ShuffleMapStage 6 (save at Calling.scala:81) failed in 0.633 s due to Job
aborted due to stage failure: Task 83 in stage 6.0 fail
Morning everyone,
The question may seem to broad but will try to synth as much as possible:
I'm used to work with Spark SQL, DFs and such on a daily basis, easily
grouping, getting extra counters and using functions or UDFs. However, I've
come to an scenario where I need to make some predictions