Can you provide an example string (row) and the expected inferred schema?
Enrico
Am 04.06.22 um 18:36 schrieb marc nicole:
How to do just that? i thought we only can inferSchema when we first
read the dataset, or am i wrong?
Le sam. 4 juin 2022 à 18:10, Sean Owen <sro...@gmail.com> a écrit :
It sounds like you want to interpret the input as strings, do some
processing, then infer the schema. That has nothing to do with
construing the entire row as a string like "Row[foo=bar, baz=1]"
On Sat, Jun 4, 2022 at 10:32 AM marc nicole <mk1853...@gmail.com>
wrote:
Hi Sean,
Thanks, actually I have a dataset where I want to inferSchema
after discarding the specific String value of "+". I do this
because the column would be considered StringType while if i
remove that "+" value it will be considered DoubleType for
example or something else. Basically I want to remove "+" from
all dataset rows and then inferschema.
Here my idea is to filter the rows not equal to "+" for the
target columns (potentially all of them) and then use
spark.read().csv() to read the new filtered dataset with the
option inferSchema which would then yield correct column types.
What do you think?
Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a
écrit :
I don't think you want to do that. You get a string
representation of structured data without the structure,
at best. This is part of the reason it doesn't work
directly this way.
You can use a UDF to call .toString on the Row of course,
but, again what are you really trying to do?
On Sat, Jun 4, 2022 at 7:35 AM marc nicole
<mk1853...@gmail.com> wrote:
Hi,
How to convert a Dataset<Row> to a Dataset<String>?
What i have tried is:
List<String> list = dataset.as
<http://dataset.as>(Encoders.STRING()).collectAsList();
Dataset<String> datasetSt = spark.createDataset(list,
Encoders.STRING()); // But this line raises
a org.apache.spark.sql.AnalysisException: Try to map
struct... to Tuple1, but failed as the number of
fields does not line up
Type of columns being String
How to solve this?