pp4 has one row, I'm guessing - containing an array of 10 images. You want 10 rows of 1 image each. But, just don't do this. Pass the bytes of the image as an array<int>, along with width/height/channels, and reshape it on use. It's just easier. That is how the Spark image representation works anyway
On Thu, Aug 3, 2023 at 8:43 PM second_co...@yahoo.com.INVALID <second_co...@yahoo.com.invalid> wrote: > Hello Adrian, > > here is the snippet > > import tensorflow_datasets as tfds > > (ds_train, ds_test), ds_info = tfds.load( > dataset_name, data_dir='<some path to your storage>', split=["train", > "test"], with_info=True, as_supervised=True > ) > > schema = StructType([ > StructField("image", > ArrayType(ArrayType(ArrayType(IntegerType()))), nullable=False), > StructField("label", IntegerType(), nullable=False) > ]) > pp4 = > spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4), > ds_info)), schema) > > > > raised error > > , TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True), > True), True) can not accept object array([[[14, 14, 14], > [14, 14, 14], > [14, 14, 14], > ..., > [19, 17, 20], > [19, 17, 20], > [19, 17, 20]], > > > > > > On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea < > poptifreaadr...@gmail.com> wrote: > > > Hello, > > can you also please show us how you created the pandas dataframe? I mean, > how you added the actual data into the dataframe. It would help us for > reproducing the error. > > Thank you, > Pop-Tifrea Adrian > > On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com < > second_co...@yahoo.com> wrote: > > i changed to > > ArrayType(ArrayType(ArrayType(IntegerType()))) , still get same error > > Thank you for responding > > On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea < > poptifreaadr...@gmail.com> wrote: > > > Hello, > > when you said your pandas Dataframe has 10 rows, does that mean it > contains 10 images? Because if that's the case, then you'd want ro only use > 3 layers of ArrayType when you define the schema. > > Best regards, > Adrian > > > > On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID > <second_co...@yahoo.com.invalid> wrote: > > i have panda dataframe with column 'image' using numpy.ndarray. shape is (500, > 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, > 500, 333, 3) > > when using spark.createDataframe(panda_dataframe, schema), i need to > specify the schema, > > schema = StructType([ > StructField("image", > ArrayType(ArrayType(ArrayType(ArrayType(IntegerType())))), nullable=False) > ]) > > > i get error > > raise TypeError( > , TypeError: field image: > ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), > True) can not accept object array([[[14, 14, 14], > > ... > > Can advise how to set schema for image with numpy.ndarray ? > > > >