I wonder if that’s the problem. Is there an equivalent hadoop fs -ls
command you can run that returns the same files you want but doesn’t have
that month= string?
On Wed, Jun 18, 2014 at 12:25 PM, Jianshi Huang
wrote:
> Hi Nicholas,
>
> month= is for Hive to auto discover the partitions. It's
Hi Nicholas,
month= is for Hive to auto discover the partitions. It's part of the url of
my files.
Jianshi
On Wed, Jun 18, 2014 at 11:52 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
> Is that month= syntax something special, or do your files actually have
> that string as part of
Is that month= syntax something special, or do your files actually have
that string as part of their name?
On Wed, Jun 18, 2014 at 2:25 AM, Jianshi Huang
wrote:
> Hi all,
>
> Thanks for the reply. I'm using parquetFile as input, is that a problem?
> In hadoop fs -ls, the path (hdfs://domain/u
Hi all,
Thanks for the reply. I'm using parquetFile as input, is that a problem? In
hadoop fs -ls, the path
(hdfs://domain/user/jianshuang/data/parquet/table/month=2014*)
will get list all the files.
I'll test it again.
Jianshi
On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang
wrote:
> Hi Andre
Hi Andrew,
Strangely in my spark (1.0.0 compiled against hadoop 2.4.0) log, it says
file not found. I'll try again.
Jianshi
On Wed, Jun 18, 2014 at 12:36 PM, Andrew Ash wrote:
> In Spark you can use the normal globs supported by Hadoop's FileSystem,
> which are documented here:
> http://hadoo
In Spark you can use the normal globs supported by Hadoop's FileSystem,
which are documented here:
http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW
wrote:
> Hi Jianshi,
>
> I have
These paths get passed directly to the Hadoop FileSystem API and I
think the support globbing out-of-the box. So AFAIK it should just
work.
On Tue, Jun 17, 2014 at 9:09 PM, MEETHU MATHEW wrote:
> Hi Jianshi,
>
> I have used wild card characters (*) in my program and it worked..
> My code was like
Hi Jianshi,
I have used wild card characters (*) in my program and it worked..
My code was like this
b = sc.textFile("hdfs:///path to file/data_file_2013SEP01*")
Thanks & Regards,
Meethu M
On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang
wrote:
It would be convenient if Spark's textFi
It would be convenient if Spark's textFile, parquetFile, etc. can support
path with wildcard, such as:
hdfs://domain/user/jianshuang/data/parquet/table/month=2014*
Or is there already a way to do it now?
Jianshi
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huan