Re: Output Committers for S3

2017-06-16 Thread sririshindra
Hi Ryan and Steve, Thanks very much for your reply. I was finally able to get Ryan's repo work for me by changing the output committer to FileOutputFormat instead of ParquetOutputCommitter in spark as Steve suggested. However, It is not working for append mode while saving the data frame.

Re: Custom Partitioning in Catalyst

2017-06-16 Thread Reynold Xin
Seems like a great idea to do? On Fri, Jun 16, 2017 at 12:03 PM, Russell Spitzer wrote: > I considered adding this to DataSource APIV2 ticket but I didn't want to > be first :P Do you think there will be any issues with opening up the > partitioning as well? > > On Fri, Jun 16, 2017 at 11:58 AM

Re: Custom Partitioning in Catalyst

2017-06-16 Thread Russell Spitzer
I considered adding this to DataSource APIV2 ticket but I didn't want to be first :P Do you think there will be any issues with opening up the partitioning as well? On Fri, Jun 16, 2017 at 11:58 AM Reynold Xin wrote: > Perhaps we should extend the data source API to support that. > > > On Fri, J

Re: Custom Partitioning in Catalyst

2017-06-16 Thread Reynold Xin
Perhaps we should extend the data source API to support that. On Fri, Jun 16, 2017 at 11:37 AM, Russell Spitzer wrote: > I've been trying to work with making Catalyst Cassandra partitioning > aware. There seem to be two major blocks on this. > > The first is that DataSourceScanExec is unable to

Custom Partitioning in Catalyst

2017-06-16 Thread Russell Spitzer
I've been trying to work with making Catalyst Cassandra partitioning aware. There seem to be two major blocks on this. The first is that DataSourceScanExec is unable to learn what the underlying partitioning should be from the BaseRelation it comes from. I'm currently able to get around this by us

Re: structured streaming documentation does not match behavior

2017-06-16 Thread Shixiong(Ryan) Zhu
I created https://issues.apache.org/jira/browse/SPARK-21123. PR is welcome. On Thu, Jun 15, 2017 at 10:55 AM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > Good catch. These are file source options. Could you submit a PR to fix > the doc? Thanks! > > On Thu, Jun 15, 2017 at 10:46 AM, Men

Re: How does MapWithStateRDD distribute the data

2017-06-16 Thread coolcoolkid
Hello, I have encountered some situation just like what is described above. I am running a Spark Streaming Application with 2 executors, 16 cores and 10G memory for each executor and the input topic Kafka has 64 partitions. My code are like this: Kafka