Re: Wildcard support in input path

2014-06-18 Thread Nicholas Chammas
I wonder if that’s the problem. Is there an equivalent hadoop fs -ls command you can run that returns the same files you want but doesn’t have that month= string? ​ On Wed, Jun 18, 2014 at 12:25 PM, Jianshi Huang wrote: > Hi Nicholas, > > month= is for Hive to auto discover the partitions. It's

Re: Wildcard support in input path

2014-06-18 Thread Jianshi Huang
Hi Nicholas, month= is for Hive to auto discover the partitions. It's part of the url of my files. Jianshi On Wed, Jun 18, 2014 at 11:52 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Is that month= syntax something special, or do your files actually have > that string as part of

Re: Wildcard support in input path

2014-06-18 Thread Nicholas Chammas
Is that month= syntax something special, or do your files actually have that string as part of their name? ​ On Wed, Jun 18, 2014 at 2:25 AM, Jianshi Huang wrote: > Hi all, > > Thanks for the reply. I'm using parquetFile as input, is that a problem? > In hadoop fs -ls, the path (hdfs://domain/u

Re: Wildcard support in input path

2014-06-17 Thread Jianshi Huang
Hi all, Thanks for the reply. I'm using parquetFile as input, is that a problem? In hadoop fs -ls, the path (hdfs://domain/user/jianshuang/data/parquet/table/month=2014*) will get list all the files. I'll test it again. Jianshi On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang wrote: > Hi Andre

Re: Wildcard support in input path

2014-06-17 Thread Jianshi Huang
Hi Andrew, Strangely in my spark (1.0.0 compiled against hadoop 2.4.0) log, it says file not found. I'll try again. Jianshi On Wed, Jun 18, 2014 at 12:36 PM, Andrew Ash wrote: > In Spark you can use the normal globs supported by Hadoop's FileSystem, > which are documented here: > http://hadoo

Re: Wildcard support in input path

2014-06-17 Thread Andrew Ash
In Spark you can use the normal globs supported by Hadoop's FileSystem, which are documented here: http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path) On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW wrote: > Hi Jianshi, > > I have

Re: Wildcard support in input path

2014-06-17 Thread Patrick Wendell
These paths get passed directly to the Hadoop FileSystem API and I think the support globbing out-of-the box. So AFAIK it should just work. On Tue, Jun 17, 2014 at 9:09 PM, MEETHU MATHEW wrote: > Hi Jianshi, > > I have used wild card characters (*) in my program and it worked.. > My code was like

Re: Wildcard support in input path

2014-06-17 Thread MEETHU MATHEW
Hi Jianshi, I have used wild card characters (*) in my program and it worked.. My code was like this b = sc.textFile("hdfs:///path to file/data_file_2013SEP01*")   Thanks & Regards, Meethu M On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang wrote: It would be convenient if Spark's textFi

Wildcard support in input path

2014-06-17 Thread Jianshi Huang
It would be convenient if Spark's textFile, parquetFile, etc. can support path with wildcard, such as: hdfs://domain/user/jianshuang/data/parquet/table/month=2014* Or is there already a way to do it now? Jianshi -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huan