ObjectInspector];
> var id = idOI.get(idRec);
> var nameOI =
> fieldRefs(2).getFieldObjectInspector().asInstanceOf[StringObjectInspector];
> var name = nameOI.getPrimitiveJavaObject(nameRec);
> var appOI =
> fieldRefs(3).getFieldObjectInspector().asInstanceOf[Stri
time, id, name, app)
}
Thanks in advance,
Glenda
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Code-to-read-RCFiles-tp14934p22545.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I used the following code as an example to deserialize
BytesRefArrayWritable.
http://www.massapi.com/source/hive-0.5.0-dev/src/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java.html
Best Regards,
Cem.
On Wed, Sep 24, 2014 at 1:34 PM, Pramod Biligiri
wrote:
> I'm afraid SparkSQL isn't
I'm afraid SparkSQL isn't an option for my use case, so I need to use the
Spark API itself.
I turned off Kryo, and I'm getting a NullPointerException now:
scala> val ref = file.take(1)(0)._2
ref: org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable =
org.apache.hadoop.hive.serde2.columnar.
I was able to read RC files with the following line:
val file: RDD[(LongWritable, BytesRefArrayWritable)] =
sc.hadoopFile("hdfs://day=2014-08-10/hour=00/",
classOf[RCFileInputFormat[LongWritable, BytesRefArrayWritable]],
classOf[LongWritable], classOf[BytesRefArrayWritable],500)
Try with dis
Is your file managed by Hive (and thus present in a Hive metastore)? In that
case, Spark SQL
(https://spark.apache.org/docs/latest/sql-programming-guide.html) is the
easiest way.
Matei
On September 23, 2014 at 2:26:10 PM, Pramod Biligiri (pramodbilig...@gmail.com)
wrote:
Hi,
I'm trying to re
Hi,
I'm trying to read some data in RCFiles using Spark, but can't seem to find
a suitable example anywhere. Currently I've written the following bit of
code that lets me count() the no. of records, but when I try to do a
collect() or a map(), it fails with a ConcurrentModificationException. I'm
ru