RE: Spark 1.3.0 DataFrame count() method throwing java.io.EOFException

Ashley Rose Thu, 02 Apr 2015 07:28:40 -0700

That’s precisely what I was trying to check. It should have 42577 records in 
it, because that’s how many there were in the text file I read in.


        // Load a text file and convert each line to a JavaBean.
        JavaRDD<String> lines = sc.textFile("file.txt");

        JavaRDD<BERecord> tbBER = lines.map(s -> convertToBER(s));

        // Apply a schema to an RDD of JavaBeans and register it as a table.
        schemaBERecords = sqlContext.createDataFrame(tbBER, BERecord.class);
        schemaBERecords.registerTempTable("tbBER");

The BERecord class is a standard Java Bean that implements Serializable, so 
that shouldn’t be the issue. As you said, count() shouldn’t fail like this even 
if the table was empty. I was able to print out the schema of the DataFrame 
just fine with df.printSchema(), and I just wanted to see if data was populated 
correctly.

From: Dean Wampler [mailto:deanwamp...@gmail.com]
Sent: Wednesday, April 01, 2015 6:05 PM
To: Ashley Rose
Cc: user@spark.apache.org
Subject: Re: Spark 1.3.0 DataFrame count() method throwing java.io.EOFException

Is it possible "tbBER" is empty? If so, it shouldn't fail like this, of course.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd 
Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe<http://typesafe.com>
@deanwampler<http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Wed, Apr 1, 2015 at 5:57 PM, ARose 
<ashley.r...@telarix.com<mailto:ashley.r...@telarix.com>> wrote:
Note: I am running Spark on Windows 7 in standalone mode.

In my app, I run the following:

        DataFrame df = sqlContext.sql("SELECT * FROM tbBER");
        System.out.println("Count: " + df.count());

tbBER is registered as a temp table in my SQLContext. When I try to print
the number of rows in the DataFrame, the job fails and I get the following
error message:

        java.io.EOFException
        at
java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2747)
        at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1033)
        at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
        at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
        at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
        at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
        at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
        at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
        at 
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
        at
org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1137)
        at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
        at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

This only happens when I try to call df.count(). The rest runs fine. Is the
count() function not supported in standalone mode? The stack trace makes it
appear to be Hadoop functionality...



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-0-DataFrame-count-method-throwing-java-io-EOFException-tp22344.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

RE: Spark 1.3.0 DataFrame count() method throwing java.io.EOFException

Reply via email to