Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
Oh, just figured it out: tabs.map(c => Array(c(167), c(110), c(200)) Thanks for all of the advice, eh?! On Sun Dec 14 2014 at 1:14:00 PM Yana Kadiyska wrote: > Denny, I am not sure what exception you're observing but I've had luck > with 2 things: > > val table = sc.textFile("hdfs://")

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Yana Kadiyska
Denny, I am not sure what exception you're observing but I've had luck with 2 things: val table = sc.textFile("hdfs://") You can try calling table.first here and you'll see the first line of the file. You can also do val debug = table.first.split("\t") which would give you an array and you ca

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
Yes - that works great! Sorry for implying I couldn't. Was just more flummoxed that I couldn't make the Scala call work on its own. Will continue to debug ;-) On Sun, Dec 14, 2014 at 11:39 Michael Armbrust wrote: > BTW, I cannot use SparkSQL / case right now because my table has 200 >> columns (a

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Michael Armbrust
> > BTW, I cannot use SparkSQL / case right now because my table has 200 > columns (and I'm on Scala 2.10.3) > You can still apply the schema programmatically: http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
Getting a bunch of syntax errors. Let me get back with the full statement and error later today. Thanks for verifying my thinking wasn't out in left field. On Sun, Dec 14, 2014 at 08:56 Gerard Maas wrote: > Hi, > > I don't get what the problem is. That map to selected columns looks like > the way

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Gerard Maas
Hi, I don't get what the problem is. That map to selected columns looks like the way to go given the context. What's not working? Kr, Gerard On Dec 14, 2014 5:17 PM, "Denny Lee" wrote: > I have a large of files within HDFS that I would like to do a group by > statement ala > > val table = sc.te

Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
I have a large of files within HDFS that I would like to do a group by statement ala val table = sc.textFile("hdfs://") val tabs = table.map(_.split("\t")) I'm trying to do something similar to tabs.map(c => (c._(167), c._(110), c._(200)) where I create a new RDD that only has but that isn't