Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Alex Nastetsky
https://issues.apache.org/jira/browse/SPARK-9926 > > On Tue, Jan 12, 2016 at 10:55 AM, Alex Nastetsky < > alex.nastet...@vervemobile.com> wrote: > >> Ran into this need myself. Does Spark have an equivalent of "mapreduce. >> input.fileinputformat.list-status.num-thr

Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Alex Nastetsky
Ran into this need myself. Does Spark have an equivalent of "mapreduce. input.fileinputformat.list-status.num-threads"? Thanks. On Thu, Jul 23, 2015 at 8:50 PM, Cheolsoo Park wrote: > Hi, > > I am wondering if anyone has successfully enabled > "mapreduce.input.fileinputformat.list-status.num-t

Re: Sort Merge Join from the filesystem

2015-11-16 Thread Alex Nastetsky
Done, thanks. On Mon, Nov 9, 2015 at 7:23 PM, Cheng, Hao wrote: > Yes, we definitely need to think how to handle this case, probably even > more common than both sorted/partitioned tables case, can you jump to the > jira and leave comment there? > > > > *

Re: Sort Merge Join from the filesystem

2015-11-09 Thread Alex Nastetsky
*From:* Reynold Xin [mailto:r...@databricks.com] > *Sent:* Thursday, November 5, 2015 1:36 AM > *To:* Alex Nastetsky > *Cc:* dev@spark.apache.org > *Subject:* Re: Sort Merge Join from the filesystem > > > > It's not supported yet, and not sure if there is a ticket for it. I do

Sort Merge Join from the filesystem

2015-11-04 Thread Alex Nastetsky
(this is kind of a cross-post from the user list) Does Spark support doing a sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, without needing to repartition/sort them again? This fun