Re: Slack invite

2021-07-08 Thread Weston Pace
INFRA-21214[1] was just updated. The invite link is indefinitely disabled due to recent spam accounts. Guests will need to be invited by existing Slack users. I'm not on the slack so I cannot help here but I'm sure someone can. [1] https://issues.apache.org/jira/browse/INFRA-21214 On Thu, Jul 8

Re: Slack invite

2021-07-08 Thread Weston Pace
I added a comment to https://issues.apache.org/jira/browse/INFRA-21214 and then I realized that someone has also opened https://issues.apache.org/jira/browse/INFRA-22088 yesterday. I'm happy to watch the link and add a reply to this thread when it appears to be working again. On Thu, Jul 8, 2021

Re: Slack invite

2021-07-08 Thread Anjali Norwood
I had the same experience. Please let us know here in dev group when this is fixed .. a bunch of us would like to join. thanks, Anjali. On Thu, Jul 8, 2021 at 3:52 PM Puneet Zaroo wrote: > I tried joining with a guest account and got an error message: "The email > address must match one of the

Re: Slack invite

2021-07-08 Thread Puneet Zaroo
I tried joining with a guest account and got an error message: "The email address must match one of the domains listed below. Please try another email." https://infra.apache.org/slack.html mentions that "Note: When Slack does an update, this URL occasionally stops functioning as it should. If the

Can iceberg write to new files in the table for each micro-batch?

2021-07-08 Thread Peter Giles
Hi all, I have a non-iceberg Spark streaming process that I'm trying to re-engineer, and am running into some trouble making it happen using Iceberg. I think I'm using a fairly common pattern, so I wonder if someone here can give me a tip on how to go about it. I'll try to be concise but give eno

Re: rowGroup:File = 1:1

2021-07-08 Thread Owen O'Malley
As Ryan & Dan said, the trade offs are roughly: bigger parquet row groups & orc stripes: * better compression * fewer read operations * lower file metadata overhead * fewer files to manage smaller row groups/stripes: * better parallelism * lower memory usage Some of the worst performing tables t

Re: GlueCatalog example?

2021-07-08 Thread Greg Hill
We can’t just turn off consistent view because we have to coordinate that change across a bunch of teams that run clusters on our platform. Since it’s only the iceberg metadata files that become inconsistent, we’re trying to move the metadata out of s3 by using the glue catalog but the data file

Re: GlueCatalog example?

2021-07-08 Thread Jack Ye
I think you need to first call setConf and then initialize, mimicking the logic in https://github.com/apache/iceberg/blob/6bcca16c48cd92dc98640130a28f73431e99e336/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L189-L191which is used by all engines to initialize catalogs. You might be able t

Re: GlueCatalog example?

2021-07-08 Thread Greg Hill
Thanks! Seems I wasn’t too far off then. It’s my understanding that because we’re using EMRFS consistent view, we should not use S3FileIO or the emrfs metadata will get out of sync, but it doesn’t seem like this catalog works with HadoopFileIO so far in my basic testing. I get a NullPointerExcep