Hi Peter, You can use the spark.readImages API in spark 2.3 for reading images:
https://databricks.com/blog/2018/12/10/introducing-built-in-image-data-source-in-apache-spark-2-4.html https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-data-support-in-apache-spark/ https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.ml.image.ImageSchema$ There’s also a spark package for spark versions older than 2.3: https://github.com/Microsoft/spark-images Thank you, Ilya From: Peter Liu <peter.p...@gmail.com> Sent: Thursday, September 5, 2019 2:13 PM To: dev <dev@spark.apache.org>; User <u...@spark.apache.org> Subject: Re: read image or binary files / spark 2.3 Hello experts, I have quick question: which API allows me to read images files or binary files (for SparkSession.readStream) from a local/hadoop file system in Spark 2.3? I have been browsing the following documentations and googling for it and didn't find a good example/documentation: https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F2.3.0%2Fstreaming-programming-guide.html&data=02%7C01%7Cilmat%40microsoft.com%7Cad36f2af52aa4cc906d908d7322cc4e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637033040182027177&sdata=vYJ%2Ftor22teIlzMGMfqvsiQn5D6iFHcf4u0N2K2dkmc%3D&reserved=0> https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.package<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F2.3.0%2Fapi%2Fscala%2Findex.html%23org.apache.spark.package&data=02%7C01%7Cilmat%40microsoft.com%7Cad36f2af52aa4cc906d908d7322cc4e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637033040182037172&sdata=HeP0Bxk6eLdCk71uH7wcCxHwIM%2FCjbhzoQaiZgs0Gi0%3D&reserved=0> any hint/help would be very much appreciated! thanks! Peter