date:20180911

Speakers needed for Apache DC Roadshow

2018-09-11 Thread Rich Bowen

We need your help to make the Apache Washington DC Roadshow on Dec 4th a success. What do we need most? Speakers! We're bringing a unique DC flavor to this event by mixing Open Source Software with talks about Apache projects as well as OSS CyberSecurity, OSS in Government and and OSS Career

Re: DataSourceWriter V2 Api questions

2018-09-11 Thread Russell Spitzer

That only works assuming that Spark is the only client of the table. It will be impossible to force an outside user to respect the special metadata table when reading so they will still see all of the data in transit. Additionally this would force the incoming data to only be written into new parti

Re: DataSourceWriter V2 Api questions

2018-09-11 Thread Thakrar, Jayesh

So if Spark and the destination datastore are both non-transactional, you will have to resort to an external mechanism for “transactionality”. Here are some options for both RDBMS and non-transaction datastore destination. For now assuming that Spark is used in batch mode (and not streaming mode)

Re: DataSourceWriter V2 Api questions

2018-09-11 Thread Russell Spitzer

I'm still not sure how the staging table helps for databases which do not have such atomicity guarantees. For example in Cassandra if you wrote all of the data temporarily to a staging table, we would still have the same problem in moving the data from the staging table into the real table. We woul

Is CBO broken?

2018-09-11 Thread emlyn

I was trying to enable CBO on one of our jobs (using Spark 2.3.1 with partitioned parquet data) but it seemed that the rowCount statistics were being ignored. I found this JIRA which seems to describe the same issue: https://issues.apache.org/jira/browse/SPARK-25185, but it has no response so far.

Re: DataSourceWriter V2 Api questions

2018-09-11 Thread Arun Mahadevan

>Some being said it is exactly-once when the output is eventually exactly-once, whereas others being said there should be no side effect, like consumer shouldn't see partial write. I guess 2PC is former, since some partitions can commit earlier while other partitions fail to commit for some time. Y

Off Heap Memory

2018-09-11 Thread Jack Kolokasis

Hello, I recently start studying the Spark's memory management system. More spesifically I want to understand how spark use the off-Heap memory. Interanlly I saw, that there are two types of offHeap memory. (offHeapExecutionMemoryPool and offHeapStorageMemoryPool). How Spark use the of

Re: DataSourceWriter V2 Api questions

2018-09-11 Thread Ross Lawley

Hi, Thanks all for the comments and discussion regarding the API! It sounds like the current expectation for database systems is to populate a staging table in the tasks and the driver moves that data when commit is called. That would work for many usecases that our users have with the MongoDB co

Speakers needed for Apache DC Roadshow

Re: DataSourceWriter V2 Api questions

Re: DataSourceWriter V2 Api questions

Re: DataSourceWriter V2 Api questions

Is CBO broken?

Re: DataSourceWriter V2 Api questions

Off Heap Memory

Re: DataSourceWriter V2 Api questions

8 matches

Site Navigation

Mail list logo

Footer information