Hello, We are using Apache Iceberg with AWS Glue. We are seeing an issue where duplicates are getting inserted into the table, even after making sure there are no duplicates in the data being upserted into the table. We use MERGE sql to upsert data into the table.
We also see an issue where duplicates appear in the SELECT sql query, when queried using spark SQL. But when we query the same table using Athena, we don’t see any duplicates in the table. We see this issue only with a few tables in our database and not all of them. We followed the directions mentioned in this blog post to do our setup - https://aws.amazon.com/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. We are currently using Spark version 3.3, Scala – 2.12, Glue – 4.0 and Iceberg.- 1.0.0. Any inputs are appreciated. Thanks! Shwetha Dharmarajan Senior Staff Software Engineer Edelman Financial Engine Ranked #1 independent advisory firm by Barron’s1 Visit. 3315 Scott Blvd, 4th Floor, Santa Clara, CA 95054 Click. EdelmanFinancialEngines.com<https://www.edelmanfinancialengines.com/> Connect. Newsletter<https://www.edelmanfinancialengines.com/newsletter> | Podcast<https://www.edelmanfinancial.com/radio> | Radio<https://www.edelmanfinancial.com/radio> | TV<https://www.edelmanfinancial.com/tv> | Books<https://www.edelmanfinancial.com/books> Shwetha Dharmarajan Senior Staff Software Engineer Edelman Financial Engines Ranked #1 independent financial advisory firm in the nation by Barron’s*. Awarded September 2023 based on data within a 12-month period. Call. 408.498.6880 (direct) Visit. 3315 Scott Blvd, 4th Floor, Santa Clara, CA 95054 Click. EdelmanFinancialEngines.com<https://www.edelmanfinancialengines.com/> Connect. Radio & Podcast<https://www.edelmanfinancialengines.com/everyday-wealth/> | LinkedIn<https://www.linkedin.com/company/edelman-financial-engines> *The Barron’s 2023 Top 100 RIA Firms list, a ranking of independent advisory firms, is qualitative and quantitative, and considers assets managed by the firms, technology spending, staff diversity, succession planning and other metrics. Firms elect to participate but do not pay to be included in the ranking. Investor experience and returns are not considered. NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use of the individual or entity to which it is addressed. If this e-mail was sent to you in error, please notify us immediately by either reply e-mail or by phone at 833-PLAN-EFE, and do not use, disseminate, retain, print or copy the e-mail or any attachment. All messages sent to and from this e-mail address may be monitored as permitted by or necessary under applicable law and regulations. We cannot accept orders for transactions or other similar instructions through e-mail. We cannot ensure the security of information e-mailed over the Internet; please exercise caution when transmitting confidential information such as account numbers and security holdings.