Oh, just figured it out:
tabs.map(c => Array(c(167), c(110), c(200))
Thanks for all of the advice, eh?!
On Sun Dec 14 2014 at 1:14:00 PM Yana Kadiyska
wrote:
> Denny, I am not sure what exception you're observing but I've had luck
> with 2 things:
>
> val table = sc.textFile("hdfs://")
Denny, I am not sure what exception you're observing but I've had luck with
2 things:
val table = sc.textFile("hdfs://")
You can try calling table.first here and you'll see the first line of the
file.
You can also do val debug = table.first.split("\t") which would give you an
array and you ca
Yes - that works great! Sorry for implying I couldn't. Was just more
flummoxed that I couldn't make the Scala call work on its own. Will
continue to debug ;-)
On Sun, Dec 14, 2014 at 11:39 Michael Armbrust
wrote:
> BTW, I cannot use SparkSQL / case right now because my table has 200
>> columns (a
>
> BTW, I cannot use SparkSQL / case right now because my table has 200
> columns (and I'm on Scala 2.10.3)
>
You can still apply the schema programmatically:
http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
Getting a bunch of syntax errors. Let me get back with the full statement
and error later today. Thanks for verifying my thinking wasn't out in left
field.
On Sun, Dec 14, 2014 at 08:56 Gerard Maas wrote:
> Hi,
>
> I don't get what the problem is. That map to selected columns looks like
> the way
Hi,
I don't get what the problem is. That map to selected columns looks like
the way to go given the context. What's not working?
Kr, Gerard
On Dec 14, 2014 5:17 PM, "Denny Lee" wrote:
> I have a large of files within HDFS that I would like to do a group by
> statement ala
>
> val table = sc.te
I have a large of files within HDFS that I would like to do a group by
statement ala
val table = sc.textFile("hdfs://")
val tabs = table.map(_.split("\t"))
I'm trying to do something similar to
tabs.map(c => (c._(167), c._(110), c._(200))
where I create a new RDD that only has
but that isn't