Great, thanks for letting us know it must have been a Spark bug. I think
that tracks with what Amogh's testing showed. He wasn't able to reproduce
it in 3.4.x, 3.5.x, or the latest in the 3.3.x line, only in 3.3.0.

On Wed, Dec 20, 2023 at 9:04 AM Harshith Bolar <harshith.bo...@rapido.bike>
wrote:

> There was an issue in my test environment which was causing it to always
> use spark 3.3.0, after fixing it and using 3.3.4, I am unable to reproduce
> the issue anymore! Looks like something changed between Spark 3.3.0 and
> 3.3.4.
>
> Thanks,
> Harshith
>
> On Fri, Dec 15, 2023 at 5:12 PM Harshith Bolar <harshith.bo...@rapido.bike>
> wrote:
>
>> Hi Ryan,
>>
>> We have verified that the issue exists even after upgrading to Spark 3.5
>> and Iceberg 1.4.2. And explicitly casting the columns to String doesn't
>> seem to help either. The error goes away when we downgrade to Iceberg 1.0.0
>> and reappears in 1.1.0. So, we're thinking a bug was possibly introduced in
>> this version.
>>
>> A Github issue is open for this more information -
>> https://github.com/apache/iceberg/issues/8333
>>
>> Possible reason -
>> https://github.com/apache/iceberg/issues/8333#issuecomment-1856243795
>>
>> Thanks,
>> Harshith
>>
>>
>> On Thu, Dec 14, 2023 at 10:19 PM Ryan Blue <b...@tabular.io> wrote:
>>
>>> Thanks for reaching out about this. It does look like a bug somewhere.
>>> I'm curious whether it works in Spark 3.5 if you have a chance to try it
>>> out.
>>>
>>> It looks like the cause is a missing cast or incorrect schema somewhere
>>> in Spark. The data coming in is a Spark string, but for some reason Spark
>>> thinks that it should be a long. Another oddity is that Spark reports that
>>> the schema has an int for the yyyymmdd column.
>>>
>>> I think it is a good sign that the insert works. That means that normal
>>> type coercion is working for CTAS and narrows the problem to the MERGE
>>> path. Can you try adding an explicit cast in your DataFrame before running
>>> the merge? I think that might fix it.
>>>
>>> On Thu, Dec 14, 2023 at 8:02 AM Sabyasachi Nandy
>>> <sabyasachi.na...@rapido.bike> wrote:
>>>
>>>> Hey Folks,
>>>>
>>>> My team members and me have been stuck in this issue for a long time.
>>>> Can you please check this issue once.
>>>> With the snippet given in the following question -
>>>> https://stackoverflow.com/questions/77655115/org-apache-spark-unsafe-types-utf8string-cannot-be-cast-to-java-lang-long-except
>>>>  you
>>>> should be able to replicate the issue.
>>>>
>>>> We will be happy to help with any further details.
>>>>
>>>>
>>>> With regards,
>>>> Sabyasachi
>>>>
>>>> THIS EMAIL COMMUNICATION IS PRIVILEGED AND MAY CONTAIN CONFIDENTIAL
>>>> INFORMATION OF RAPIDO. IF YOU ARE NOT THE INTENDED RECIPIENT, YOU ARE
>>>> HEREBY NOTIFIED THAT YOU HAVE RECEIVED THIS MESSAGE IN ERROR AND ANY
>>>> REVIEW, DISSEMINATION, DISTRIBUTION OR COPYING OF THIS MESSAGE IS STRICTLY
>>>> PROHIBITED. PLEASE NOTIFY US IMMEDIATELY BY EMAIL AND DELETE THE MESSAGE
>>>> FROM YOUR SYSTEM.
>>>>
>>>> NOTHING CONTAINED IN THIS DISCLAIMER SHALL BE CONSTRUED IN ANY WAY TO
>>>> GRANT PERMISSION TO TRANSMIT CONFIDENTIAL INFORMATION OR AS A WAIVER OF ANY
>>>> CONFIDENTIALITY OR PRIVILEGE.
>>>>
>>>> RAPIDO DOES NOT ACCEPT ANY RESPONSIBILITY OR LIABILITY ARISING FROM THE
>>>> USE OF THIS COMMUNICATION. NO REPRESENTATION IS BEING MADE THAT THE
>>>> INFORMATION PRESENTED IS ACCURATE, CURRENT OR COMPLETE AND SUCH INFORMATION
>>>> IS AT ALL TIMES SUBJECT TO CHANGE WITHOUT NOTICE
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
> THIS EMAIL COMMUNICATION IS PRIVILEGED AND MAY CONTAIN CONFIDENTIAL
> INFORMATION OF RAPIDO. IF YOU ARE NOT THE INTENDED RECIPIENT, YOU ARE
> HEREBY NOTIFIED THAT YOU HAVE RECEIVED THIS MESSAGE IN ERROR AND ANY
> REVIEW, DISSEMINATION, DISTRIBUTION OR COPYING OF THIS MESSAGE IS STRICTLY
> PROHIBITED. PLEASE NOTIFY US IMMEDIATELY BY EMAIL AND DELETE THE MESSAGE
> FROM YOUR SYSTEM.
>
> NOTHING CONTAINED IN THIS DISCLAIMER SHALL BE CONSTRUED IN ANY WAY TO
> GRANT PERMISSION TO TRANSMIT CONFIDENTIAL INFORMATION OR AS A WAIVER OF ANY
> CONFIDENTIALITY OR PRIVILEGE.
>
> RAPIDO DOES NOT ACCEPT ANY RESPONSIBILITY OR LIABILITY ARISING FROM THE
> USE OF THIS COMMUNICATION. NO REPRESENTATION IS BEING MADE THAT THE
> INFORMATION PRESENTED IS ACCURATE, CURRENT OR COMPLETE AND SUCH INFORMATION
> IS AT ALL TIMES SUBJECT TO CHANGE WITHOUT NOTICE
>


-- 
Ryan Blue
Tabular

Reply via email to