pp4 has one row, I'm guessing - containing an array of 10 images. You want
10 rows of 1 image each.
But, just don't do this. Pass the bytes of the image as an array<int>,
along with width/height/channels, and reshape it on use. It's just easier.
That is how the Spark image representation works anyway

On Thu, Aug 3, 2023 at 8:43 PM second_co...@yahoo.com.INVALID
<second_co...@yahoo.com.invalid> wrote:

> Hello Adrian,
>
>   here is the snippet
>
> import tensorflow_datasets as tfds
>
> (ds_train, ds_test), ds_info = tfds.load(
>     dataset_name, data_dir='<some path to your storage>',  split=["train",
> "test"], with_info=True, as_supervised=True
> )
>
> schema = StructType([
>         StructField("image",
> ArrayType(ArrayType(ArrayType(IntegerType()))), nullable=False),
>         StructField("label", IntegerType(), nullable=False)
>     ])
> pp4 =
> spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4),
> ds_info)), schema)
>
>
>
> raised error
>
> , TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True), 
> True), True) can not accept object array([[[14, 14, 14],
>         [14, 14, 14],
>         [14, 14, 14],
>         ...,
>         [19, 17, 20],
>         [19, 17, 20],
>         [19, 17, 20]],
>
>
>
>
>
> On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> can you also please show us how you created the pandas dataframe? I mean,
> how you added the actual data into the dataframe. It would help us for
> reproducing the error.
>
> Thank you,
> Pop-Tifrea Adrian
>
> On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com <
> second_co...@yahoo.com> wrote:
>
> i changed to
>
> ArrayType(ArrayType(ArrayType(IntegerType()))) , still get same error
>
> Thank you for responding
>
> On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea <
> poptifreaadr...@gmail.com> wrote:
>
>
> Hello,
>
> when you said your pandas Dataframe has 10 rows, does that mean it
> contains 10 images? Because if that's the case, then you'd want ro only use
> 3 layers of ArrayType when you define the schema.
>
> Best regards,
> Adrian
>
>
>
> On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
> <second_co...@yahoo.com.invalid> wrote:
>
> i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
> 333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10,
> 500, 333, 3)
>
> when using spark.createDataframe(panda_dataframe, schema), i need to
> specify the schema,
>
> schema = StructType([
>         StructField("image",
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType())))), nullable=False)
>     ])
>
>
> i get error
>
> raise TypeError(
> , TypeError: field image: 
> ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True), 
> True) can not accept object array([[[14, 14, 14],
>
> ...
>
> Can advise how to set schema for image with numpy.ndarray ?
>
>
>
>

Reply via email to