Sorry for being so Dense and thank you for your help.

I was using this version
phoenix-spark-5.0.0-HBase-2.0.jar

Because it was the latest in this repo
https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark


On Mon, Aug 21, 2023 at 5:07 PM Sean Owen <sro...@gmail.com> wrote:

> It is. But you have a third party library in here which seems to require a
> different version.
>
> On Mon, Aug 21, 2023, 7:04 PM Kal Stevens <kalgstev...@gmail.com> wrote:
>
>> OK, it was my impression that scala was packaged with Spark to avoid a
>> mismatch
>> https://spark.apache.org/downloads.html
>>
>> It looks like spark 3.4.1 (my version) uses scala Scala 2.12
>> How do I specify the scala version?
>>
>> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> That's a mismatch in the version of scala that your library uses vs
>>> spark uses.
>>>
>>> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens <kalgstev...@gmail.com> wrote:
>>>
>>>> I am having a hard time figuring out what I am doing wrong here.
>>>> I am not sure if I have an incompatible version of something installed
>>>> or something else.
>>>> I can not find anything relevant in google to figure out what I am
>>>> doing wrong
>>>> I am using *spark 3.4.1*, and *python3.10*
>>>>
>>>> This is my code to save my dataframe
>>>> urls = []
>>>> pull_sitemap_xml(robot, urls)
>>>> df = spark.createDataFrame(data=urls, schema=schema)
>>>> df.write.format("org.apache.phoenix.spark") \
>>>>     .mode("overwrite") \
>>>>     .option("table", "property") \
>>>>     .option("zkUrl", "192.168.1.162:2181") \
>>>>     .save()
>>>>
>>>> urls is an array of maps, containing a "url" and a "last_mod" field.
>>>>
>>>> Here is the error that I am getting
>>>>
>>>> Traceback (most recent call last):
>>>>
>>>>   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in
>>>> main
>>>>
>>>>     .save()
>>>>
>>>>   File
>>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>>>> line 1396, in save
>>>>
>>>>     self._jwrite.save()
>>>>
>>>>   File
>>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
>>>> line 1322, in __call__
>>>>
>>>>     return_value = get_return_value(
>>>>
>>>>   File
>>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
>>>> line 169, in deco
>>>>
>>>>     return f(*a, **kw)
>>>>
>>>>   File
>>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
>>>> line 326, in get_return_value
>>>>
>>>>     raise Py4JJavaError(
>>>>
>>>> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.
>>>>
>>>> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
>>>> scala.Predef$.refArrayOps(java.lang.Object[])'
>>>>
>>>> at
>>>> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)
>>>>
>>>> at
>>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)
>>>>
>>>> at
>>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>>>>
>>>> at
>>>> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>>>>
>>>> at
>>>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
>>>>
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>>>>
>>>> at
>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>>>>
>>>

Reply via email to