prashanthpdesai commented on issue #1775:
URL: https://github.com/apache/hudi/issues/1775#issuecomment-654559192
@bhasudha : Hi , tried with same packages which you have mentioned above ,
we see diff kind of error .
Please find the trace below .
**spark-shell --queue queue_q1 --packages
org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'**
Warning: Master yarn-client is deprecated since 2.0. Please use master
"yarn" with specified deploy mode instead.
The jars for the packages stored in: /home/edzmmprd/.ivy2/jars
:: loading settings :: url =
jar:file:/opt/mapr/spark/spark-2.2.1/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.apache.hudi#hudi-spark-bundle_2.11;0.5.3 in central
found org.apache.spark#spark-avro_2.11;2.4.4 in central
found org.apache.spark#spark-tags_2.11;2.4.4 in central
found org.spark-project.spark#unused;1.0.0 in central
downloading
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.5.3/hudi-spark-bundle_2.11-0.5.3.jar
...
[SUCCESSFUL ]
org.apache.hudi#hudi-spark-bundle_2.11;0.5.3!hudi-spark-bundle_2.11.jar (787ms)
downloading
https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.11/2.4.4/spark-avro_2.11-2.4.4.jar
...
[SUCCESSFUL ]
org.apache.spark#spark-avro_2.11;2.4.4!spark-avro_2.11.jar (13ms)
downloading
https://repo1.maven.org/maven2/org/apache/spark/spark-tags_2.11/2.4.4/spark-tags_2.11-2.4.4.jar
...
[SUCCESSFUL ]
org.apache.spark#spark-tags_2.11;2.4.4!spark-tags_2.11.jar (6ms)
downloading
https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar
...
[SUCCESSFUL ]
org.spark-project.spark#unused;1.0.0!unused.jar (5ms)
:: resolution report :: resolve 2006ms :: artifacts dl 817ms
:: modules in use:
org.apache.hudi#hudi-spark-bundle_2.11;0.5.3 from central in
[default]
org.apache.spark#spark-avro_2.11;2.4.4 from central in
[default]
org.apache.spark#spark-tags_2.11;2.4.4 from central in
[default]
org.spark-project.spark#unused;1.0.0 from central in
[default]
---------------------------------------------------------------------
| | modules ||
artifacts |
| conf | number| search|dwnlded|evicted||
number|dwnlded|
---------------------------------------------------------------------
| default | 4 | 4 | 4 | 0 || 4
| 4 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
4 artifacts copied, 0 already retrieved (20789kB/44ms)
scala> import org.apache.hudi.QuickstartUtils._
import org.apache.hudi.QuickstartUtils._
scala> import scala.collection.JavaConversions._
import scala.collection.JavaConversions._
scala> import org.apache.spark.sql.SaveMode._
import org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceReadOptions._
scala> import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.DataSourceWriteOptions._
scala> import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.config.HoodieWriteConfig._
scala> val basepath= "/datalake/uhclake/edz/prd/mcm/mcm_hudi_cow_dedup_fix"
basepath: String = /datalake/uhclake/edz/prd/mcm/mcm_hudi_cow_dedup_fix
scala> spark.read.format("org.apache.hudi").load(basepath +
"/*").createOrReplaceTempView("hudi_tab")
scala> val commits = spark.sql("select distinct(_hoodie_commit_time) as
commitTime from hudi_tab order by commitTime").map(k =>
k.getString(0)).take(50)
commits: Array[String] = Array(20200703000922, 20200703002654,
20200703010757, 20200703020715, 20200703030709, 20200703041422, 20200703051419,
20200703060728, 20200703070921, 20200703080801, 20200703090728, 20200703101459,
20200703110839, 20200703120708, 20200703131249, 20200703140738, 20200703151235,
20200703160723, 20200703170659, 20200703181223, 20200703211557, 20200703220646,
20200703231410, 20200704001432, 20200704010736, 20200704020754, 20200704030729,
20200704040836, 20200704050652, 20200704060650, 20200704070749, 20200704080711,
20200704090720, 20200704100722, 20200704110857, 20200704120736, 20200704130712,
20200704140741, 20200704150641, 20200704160732, 20200704170709, 20200704180745,
20200704190843, 20200704200658, 20200704210717, 20200704220739, 20200704230704,
20200705001445...
scala> val beginTime = commits(commits.length - 2)
beginTime: String = 20200705010728
scala> val incrementalDF =
spark.read.format("org.apache.hudi").option(QUERY_TYPE_OPT_KEY,
QUERY_TYPE_INCREMENTAL_OPT_VAL).option(BEGIN_INSTANTTIME_OPT_KEY,
beginTime).load(basepath);
**java.lang.NoSuchMethodError:
org.apache.avro.Schema$Field.<init>**(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
at
org.apache.hudi.common.util.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:110)
at
org.apache.hudi.common.util.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:101)
at org.apache.hudi.IncrementalRelation.<init>(IncrementalRelation.scala:76)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:80)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:47)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 60 elided
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]