Unfortunately I can't show exactly the data I'm using, but this is what I'm
seeing:
I have a case class 'Product' that represents a table in our database. I
load that data via sqlContext.read.format("jdbc").options(...).load.as[Product]
and register it in a temp table 'product'.
For testing, I created a new Dataset that has only 3 records in it:
val ts = sqlContext.sql("select * from product where product_catalog_id in
(1, 2, 3)").as[Product]
I also created another one using the same case class and data, but from a
sequence instead.
val ds: Dataset[Product] = Seq(
Product(Some(1), ...),
Product(Some(2), ...),
Product(Some(3), ...)
).toDS
The spark shell tells me these are exactly the same type at this point, but
they don't behave the same.
ts.as("ts1").joinWith(ts.as("ts2"), $"ts1.product_catalog_id" ===
$"ts2.product_catalog_id")
ds.as("ds1").joinWith(ds.as("ds2"), $"ds1.product_catalog_id" ===
$"ds2.product_catalog_id")
Again, spark tells me these self joins return exactly the same type, but
when I do a .show on them, only the one created from a Seq works. The one
created by reading from the database throws this error:
org.apache.spark.sql.AnalysisException: cannot resolve
'ts1.product_catalog_id' given input columns: [..., product_catalog_id,
...];
Is this a bug? Is there anyway to make the Dataset loaded from a table
behave like the one created from a sequence?
Thanks,
Tim