Looks like what I was suggesting doesn't work. :/ On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang <zjf...@gmail.com> wrote:
> Yes, that's what I suggest. TextInputFormat support multiple inputs. So in > spark side, we just need to provide API to for that. > > On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota <pradeep...@gmail.com> > wrote: > >> IIRC, TextInputFormat supports an input path that is a comma separated >> list. I haven't tried this, but I think you should just be able to do >> sc.textFile("file1,file2,...") >> >> On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> I know these workaround, but wouldn't it be more convenient and >>> straightforward to use SparkContext#textFiles ? >>> >>> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com> >>> wrote: >>> >>>> For more than a small number of files, you'd be better off using >>>> SparkContext#union instead of RDD#union. That will avoid building up a >>>> lengthy lineage. >>>> >>>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> >>>> wrote: >>>> >>>>> Hey Jeff, >>>>> Do you mean reading from multiple text files? In that case, as a >>>>> workaround, you can use the RDD#union() (or ++) method to concatenate >>>>> multiple rdds. For example: >>>>> >>>>> val lines1 = sc.textFile("file1") >>>>> val lines2 = sc.textFile("file2") >>>>> >>>>> val rdd = lines1 union lines2 >>>>> >>>>> regards, >>>>> --Jakob >>>>> >>>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: >>>>> >>>>>> Although user can use the hdfs glob syntax to support multiple >>>>>> inputs. But sometimes, it is not convenient to do that. Not sure why >>>>>> there's no api of SparkContext#textFiles. It should be easy to implement >>>>>> that. I'd love to create a ticket and contribute for that if there's no >>>>>> other consideration that I don't know. >>>>>> >>>>>> -- >>>>>> Best Regards >>>>>> >>>>>> Jeff Zhang >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> > > > -- > Best Regards > > Jeff Zhang >