That’s precisely what I was trying to check. It should have 42577 records in it, because that’s how many there were in the text file I read in.
// Load a text file and convert each line to a JavaBean. JavaRDD<String> lines = sc.textFile("file.txt"); JavaRDD<BERecord> tbBER = lines.map(s -> convertToBER(s)); // Apply a schema to an RDD of JavaBeans and register it as a table. schemaBERecords = sqlContext.createDataFrame(tbBER, BERecord.class); schemaBERecords.registerTempTable("tbBER"); The BERecord class is a standard Java Bean that implements Serializable, so that shouldn’t be the issue. As you said, count() shouldn’t fail like this even if the table was empty. I was able to print out the schema of the DataFrame just fine with df.printSchema(), and I just wanted to see if data was populated correctly. From: Dean Wampler [mailto:deanwamp...@gmail.com] Sent: Wednesday, April 01, 2015 6:05 PM To: Ashley Rose Cc: user@spark.apache.org Subject: Re: Spark 1.3.0 DataFrame count() method throwing java.io.EOFException Is it possible "tbBER" is empty? If so, it shouldn't fail like this, of course. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe<http://typesafe.com> @deanwampler<http://twitter.com/deanwampler> http://polyglotprogramming.com On Wed, Apr 1, 2015 at 5:57 PM, ARose <ashley.r...@telarix.com<mailto:ashley.r...@telarix.com>> wrote: Note: I am running Spark on Windows 7 in standalone mode. In my app, I run the following: DataFrame df = sqlContext.sql("SELECT * FROM tbBER"); System.out.println("Count: " + df.count()); tbBER is registered as a temp table in my SQLContext. When I try to print the number of rows in the DataFrame, the job fails and I get the following error message: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2747) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1033) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1137) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This only happens when I try to call df.count(). The rest runs fine. Is the count() function not supported in standalone mode? The stack trace makes it appear to be Hadoop functionality... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-0-DataFrame-count-method-throwing-java-io-EOFException-tp22344.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>