Re: How to convert a Dataset to a Dataset?

Enrico Minack Sat, 04 Jun 2022 09:41:06 -0700

Can you provide an example string (row) and the expected inferred schema?


Enrico


Am 04.06.22 um 18:36 schrieb marc nicole:

How to do just that? i thought we only can inferSchema when we firstread the dataset, or am i wrong?


Le sam. 4 juin 2022 à 18:10, Sean Owen <sro...@gmail.com> a écrit :

    It sounds like you want to interpret the input as strings, do some
    processing, then infer the schema. That has nothing to do with
    construing the entire row as a string like "Row[foo=bar, baz=1]"

    On Sat, Jun 4, 2022 at 10:32 AM marc nicole <mk1853...@gmail.com>
    wrote:

        Hi Sean,

        Thanks, actually I have a dataset where I want to inferSchema
        after discarding the specific String value of "+". I do this
        because the column would be considered StringType while if i
        remove that "+" value it will be considered DoubleType for
        example or something else. Basically I want to remove "+" from
        all dataset rows and then inferschema.
        Here my idea is to filter the rows not equal to "+" for the
        target columns (potentially all of them) and then use
        spark.read().csv() to read the new filtered dataset with the
        option inferSchema which would then yield correct column types.
        What do you think?

        Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a
        écrit :

            I don't think you want to do that. You get a string
            representation of structured data without the structure,
            at best. This is part of the reason it doesn't work
            directly this way.
            You can use a UDF to call .toString on the Row of course,
            but, again what are you really trying to do?

            On Sat, Jun 4, 2022 at 7:35 AM marc nicole
            <mk1853...@gmail.com> wrote:

                Hi,
                How to convert a Dataset<Row> to a Dataset<String>?
                What i have tried is:

                List<String> list = dataset.as
                <http://dataset.as>(Encoders.STRING()).collectAsList();
                Dataset<String> datasetSt = spark.createDataset(list,
                Encoders.STRING()); // But this line raises
                a org.apache.spark.sql.AnalysisException: Try to map
                struct... to Tuple1, but failed as the number of
                fields does not line up

                Type of columns being String
                How to solve this?

Re: How to convert a Dataset to a Dataset?

Reply via email to