Re: Spark Code to read RCFiles

2015-04-17 Thread Pramod Biligiri
ObjectInspector]; > var id = idOI.get(idRec); > var nameOI = > fieldRefs(2).getFieldObjectInspector().asInstanceOf[StringObjectInspector]; > var name = nameOI.getPrimitiveJavaObject(nameRec); > var appOI = > fieldRefs(3).getFieldObjectInspector().asInstanceOf[Stri

Re: Spark Code to read RCFiles

2015-04-17 Thread gle
time, id, name, app) } Thanks in advance, Glenda -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Code-to-read-RCFiles-tp14934p22545.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Code to read RCFiles

2014-09-24 Thread cem
I used the following code as an example to deserialize BytesRefArrayWritable. http://www.massapi.com/source/hive-0.5.0-dev/src/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java.html Best Regards, Cem. On Wed, Sep 24, 2014 at 1:34 PM, Pramod Biligiri wrote: > I'm afraid SparkSQL isn't

Re: Spark Code to read RCFiles

2014-09-24 Thread Pramod Biligiri
I'm afraid SparkSQL isn't an option for my use case, so I need to use the Spark API itself. I turned off Kryo, and I'm getting a NullPointerException now: scala> val ref = file.take(1)(0)._2 ref: org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable = org.apache.hadoop.hive.serde2.columnar.

Re: Spark Code to read RCFiles

2014-09-24 Thread cem
I was able to read RC files with the following line: val file: RDD[(LongWritable, BytesRefArrayWritable)] = sc.hadoopFile("hdfs://day=2014-08-10/hour=00/", classOf[RCFileInputFormat[LongWritable, BytesRefArrayWritable]], classOf[LongWritable], classOf[BytesRefArrayWritable],500) Try with dis

Re: Spark Code to read RCFiles

2014-09-23 Thread Matei Zaharia
Is your file managed by Hive (and thus present in a Hive metastore)? In that case, Spark SQL (https://spark.apache.org/docs/latest/sql-programming-guide.html) is the easiest way. Matei On September 23, 2014 at 2:26:10 PM, Pramod Biligiri (pramodbilig...@gmail.com) wrote: Hi, I'm trying to re

Spark Code to read RCFiles

2014-09-23 Thread Pramod Biligiri
Hi, I'm trying to read some data in RCFiles using Spark, but can't seem to find a suitable example anywhere. Currently I've written the following bit of code that lets me count() the no. of records, but when I try to do a collect() or a map(), it fails with a ConcurrentModificationException. I'm ru