Z is just an example. It could be anything. Basically, anything that's not
in schema should be filtered out.
On Tue, 4 Jul 2023, 13:27 Hill Liu, wrote:
> I think you can define schema with column z and filter out records with z
> is null.
>
> On Tue, Jul 4, 2023 at 3:24 PM Shas
Yes, drop malformed does filter out record4. However, record 5 is not.
On Tue, 4 Jul 2023 at 07:41, Vikas Kumar wrote:
> Have you tried dropmalformed option ?
>
> On Mon, Jul 3, 2023, 1:34 PM Shashank Rao wrote:
>
>> Update: Got it working by using the *_corrupt_record *fiel
ds.filter(functions.col("_corrupt_record").isNull()).collect();
However, I haven't figured out on how to ignore record 5.
Any help is appreciated.
On Mon, 3 Jul 2023 at 19:24, Shashank Rao wrote:
> Hi all,
> I'm trying to read around 1,000,000 JSONL files present in S3 usin
d("a", DataTypes.LongType).add("b",
DataTypes.LongType);
Dataset ds = spark.read().schema(schema).json("path/to/file")
This gives me a dataset that has record 4 with y=null and record 5 with x
and y.
Any help is appreciated.
--
Thanks,
Shashank Rao
ell.
Please note, modifying the source data is not an option that I have. Hence,
I cannot merge multiple small files into a single large file.
Any help is appreciated.
--
Thanks,
Shashank Rao