Re: Possible bug in DatasourceV2

Hyukjin Kwon Thu, 11 Oct 2018 00:25:00 -0700

See https://github.com/apache/spark/pull/22688


+WEnchen, here looks the problem raised. This might have to be considered
as a blocker ...


On Thu, 11 Oct 2018, 2:48 pm assaf.mendelson, <assaf.mendel...@rsa.com>
wrote:

> Hi,
>
> I created a datasource writer WITHOUT a reader. When I do, I get an
> exception: org.apache.spark.sql.AnalysisException: Data source is not
> readable: DefaultSource
>
> The reason for this is that when save is called, inside the source match to
> WriterSupport we have the following code:
>
> val source = cls.newInstance().asInstanceOf[DataSourceV2]
>       source match {
>         case ws: WriteSupport =>
>           val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
>             source,
>             df.sparkSession.sessionState.conf)
>           val options = sessionOptions ++ extraOptions
> -->      val relation = DataSourceV2Relation.create(source, options)
>
>           if (mode == SaveMode.Append) {
>             runCommand(df.sparkSession, "save") {
>               AppendData.byName(relation, df.logicalPlan)
>             }
>
>           } else {
>             val writer = ws.createWriter(
>               UUID.randomUUID.toString, df.logicalPlan.output.toStructType,
> mode,
>               new DataSourceOptions(options.asJava))
>
>             if (writer.isPresent) {
>               runCommand(df.sparkSession, "save") {
>                 WriteToDataSourceV2(writer.get, df.logicalPlan)
>               }
>             }
>           }
>
> but DataSourceV2Relation.create actively creates a reader
> (source.createReader) to extract the schema:
>
> def create(
>       source: DataSourceV2,
>       options: Map[String, String],
>       tableIdent: Option[TableIdentifier] = None,
>       userSpecifiedSchema: Option[StructType] = None): DataSourceV2Relation
> = {
>     val reader = source.createReader(options, userSpecifiedSchema)
>     val ident = tableIdent.orElse(tableFromOptions(options))
>     DataSourceV2Relation(
>       source, reader.readSchema().toAttributes, options, ident,
> userSpecifiedSchema)
>   }
>
>
> This makes me a little confused.
>
> First, the schema is defined by the dataframe itself, not by the data
> source, i.e. it should be extracted from df.schema and not by
> source.createReader
>
> Second, I see that relation is actually only use if the mode is
> SaveMode.append (btw this means if it is needed it should be defined inside
> the "if"). I am not sure I understand the portion of the AppendData but why
> would reading from the source be included?
>
> Am I missing something here?
>
> Thanks,
>    Assaf
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Possible bug in DatasourceV2

Reply via email to