Victor Ferrer created ZEPPELIN-2496:
---------------------------------------

             Summary: Error listing a HDFS directory with a large number of 
files
                 Key: ZEPPELIN-2496
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2496
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.7.1
         Environment: Centos 7 - CDH 5
            Reporter: Victor Ferrer


Hi,

I have noticed an incorrect behavior while using the HDFS (%file) interpreter.
For instance, when I list this directory, I get the correct result:

{noformat}
%file
ls -l /mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=1
{noformat}

{noformat}
-rw-r--r--      3        hdfs   supergroup      89376267        2017-05-03 
12:29GMT     
/mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=1/part-r-00000-ac4e728a-332e-40ca-b42b-daaf783fe227.snappy.parquet
-rw-r--r--      3        hdfs   supergroup      88585675        2017-05-03 
12:29GMT     
/mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=1/part-r-00001-ac4e728a-332e-40ca-b42b-daaf783fe227.snappy.parquet
{noformat}

However, when I switch to a bigger directory, I get an error stating that the 
directory could not be found:

{noformat}
%file
ls -l /mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=2
{noformat}

{noformat}
Could not find file or directory:       
/mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=2
{noformat}

If I dig in the logs, I get this error message:

{noformat}
ERROR [2017-05-04 12:05:44,910] ({pool-2-thread-14} 
HDFSFileInterpreter.java[listAll]:227) - listall: listDir 
/mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=2
com.google.gson.JsonSyntaxException: java.io.EOFException: End of input at line 
1 column 311752
        at com.google.gson.Gson.fromJson(Gson.java:800)
        at com.google.gson.Gson.fromJson(Gson.java:757)
        at com.google.gson.Gson.fromJson(Gson.java:706)
        at com.google.gson.Gson.fromJson(Gson.java:678)
        at 
org.apache.zeppelin.file.HDFSFileInterpreter.listAll(HDFSFileInterpreter.java:212)
        at 
org.apache.zeppelin.file.FileInterpreter.interpret(FileInterpreter.java:130)
        at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: End of input at line 1 column 311752
        at 
com.google.gson.stream.JsonReader.nextNonWhitespace(JsonReader.java:954)
        at com.google.gson.stream.JsonReader.nextInArray(JsonReader.java:677)
        at com.google.gson.stream.JsonReader.peek(JsonReader.java:376)
        at com.google.gson.stream.JsonReader.hasNext(JsonReader.java:349)
        at 
com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:71)
        at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:93)
        at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:172)
        at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:93)
        at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:172)
        at com.google.gson.Gson.fromJson(Gson.java:791)
        ... 16 more
ERROR [2017-05-04 12:05:44,911] ({pool-2-thread-14} 
FileInterpreter.java[interpret]:133) - Error listing files in path 
/mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=2
org.apache.zeppelin.interpreter.InterpreterException: Could not find file or 
directory: /mediation2/kpis/ran/cell3g/temporalAgg=ROP/year=2017/month=5/day=2
        at 
org.apache.zeppelin.file.HDFSFileInterpreter.listAll(HDFSFileInterpreter.java:228)
        at 
org.apache.zeppelin.file.FileInterpreter.interpret(FileInterpreter.java:130)
        at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

I understand that the directory might be too big for the underlying REST 
interface (the /webhdfs interface) but perhaps a more graceful message could be 
returned, or perhaps some partial content, etc.

Cheers,
Victor



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to