[SS] Writing a test for a possible bug in StateStoreSaveExec with Append output mode?

2017-09-03 Thread Jacek Laskowski
Hi,

I may have found a bug in StateStoreSaveExec with Append output mode
and would love proving myself I'm wrong or help squashing it by
writing a test for the case.

Is there a test for StateStoreSaveExec with Append output mode? If
not, is there a streaming test template that could be very close to a
test and that I could use?

Thanks for any help you may offer!

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.2.0 - Odd Hive SQL Warnings

2017-09-03 Thread Liang-Chi Hsieh

Hi Don,

There is a new SQL config `spark.sql.hive.caseSensitiveInferenceMode` which
sets the action to take when a case-sensitive schema cannot be read from a
Hive table's properties.

The default setting of this config is `INFER_AND_SAVE` that goes toinfer the
case-sensitive schema from the underlying data files and write it back to
the table properties. From your description, I think you don't have the
right to write to the Hive table. So you see the warning log.

You can change the setting to `INFER_ONLY` which infers the schema but
doesn't try to write back to table properties, or `NEVER_INFER` which
fallbacks to using the case-insensitive metastore schema without inferring.


Don Drake wrote
> I'm in the process of migrating a few applications from Spark 2.1.1 to
> Spark 2.2.0 and so far the transition has been smooth.  One odd thing is
> that when I query a Hive table that I do not own, but have read access, I
> get a very long WARNING with a stack trace that basically says I do not
> have permission to ALTER the table.
> 
> As you can see, I'm just doing a SELECT on the table.   Everything works,
> but this stack trace is a little concerning.  Anyone know what is going
> on?
> 
> 
> I'm using a downloaded binary (spark-2.2.0-bin-hadoop2.6) on CDH 5.10.1.
> 
> Thanks.
> 
> -Don
> 
> -- 
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake ;
> 800-733-2143
> 
> scal> spark.sql("select * from test.my_table")
> 17/09/01 15:40:30 WARN HiveExternalCatalog: Could not alter schema of
> table
>  `test`.`my_table` in a Hive compatible way. Updating Hive metastore in
> Spark SQL specific format.
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.sql.hive.client.Shim_v0_12.alterTable(HiveShim.scala:399)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterTable$1.apply$mcV$sp(HiveClientImpl.scala:461)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterTable$1.apply(HiveClientImpl.scala:457)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterTable$1.apply(HiveClientImpl.scala:457)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:290)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:231)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:230)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:457)
> at
> org.apache.spark.sql.hive.client.HiveClient$class.alterTable(HiveClient.scala:87)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:79)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableSchema$1.apply$mcV$sp(HiveExternalCatalog.scala:636)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableSchema$1.apply(HiveExternalCatalog.scala:627)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableSchema$1.apply(HiveExternalCatalog.scala:627)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.alterTableSchema(HiveExternalCatalog.scala:627)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.updateCatalogSchema(HiveMetastoreCatalog.scala:267)
> at org.apache.spark.sql.hive.HiveMetastoreCatalog.org
> $apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:251)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6$$anonfun$7.apply(HiveMetastoreCatalog.scala:195)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6$$anonfun$7.apply(HiveMetastoreCatalog.scala:194)
> at scala.Option.getOrElse(Option.scala:121)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6.apply(HiveMetastoreCatalog.scala:194)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$6.apply(HiveMetastoreCatalog.scala:187)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54)
> at
> org.ap

No rows in Apache Tajo table reading(Spark SQL)

2017-09-03 Thread Cinyoung Hur
Hi,

I want to read Apache Tajo table using spark sql.

Tajo JDBC driver is added to spark-shell, but Tajo table doesn't show
anything.
The followings are Spark code and the result.

$ spark-shell --jars tajo-jdbc-0.11.3.jar

scala> val componentDF = spark.sqlContext.load("jdbc", Map(
"url"-> "jdbc:tajo://tajo-master-ip:26002/analysis",
"driver"->"org.apache.tajo.jdbc.TajoDriver",
"dbtable"->"component_usage_2015"
))
scala> componentDF.registerTempTable("components")
scala> val allComponents = sqlContext.sql("select * from components")
scala> allComponents.show(5)


warning: there was one deprecation warning; re-run with -deprecation for
details
componentDF: org.apache.spark.sql.DataFrame =
[analysis.component_usage_2015.gnl_nm_cd: string,
analysis.component_usage_2015.qty: double ... 1 more field]
warning: there was one deprecation warning; re-run with -deprecation for
details
allComponents: org.apache.spark.sql.DataFrame =
[analysis.component_usage_2015.gnl_nm_cd: string,
analysis.component_usage_2015.qty: double ... 1 more field]
++--+--+
|analysis.component_usage_2015.gnl_nm_cd|analysis.component_usage_2015.qty|analysis.component_usage_2015.amt|
++--+--+
++--+--+

Regards,
Cinyoung Hur