Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded
Anybody reply on this ? On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri wrote: > > Hello Spark Users, > > I am getting below error, when i am trying to write dataset to parquet > location. I have enough disk space available. Last time i was facing same > kind of error which were resolved by increasing number of cores at hyper > parameters. Currently result set data size is almost 400Gig with below > hyper parameters > > Driver memory: 4g > Executor Memory: 16g > Executor cores=12 > num executors= 8 > > Still it's failing, any Idea ? that if i increase executor memory and > number of executors. it could get resolved ? > > > 17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception > while reverting partial writes to file /mapr/chetan/local/david.com/ > tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_ > 1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5- > e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c > java.io.IOException: Disk quota exceeded > at java.io.FileOutputStream.close0(Native Method) > at java.io.FileOutputStream.access$000(FileOutputStream.java:53) > at java.io.FileOutputStream$1.close(FileOutputStream.java:356) > at java.io.FileDescriptor.closeAll(FileDescriptor.java:212) > at java.io.FileOutputStream.close(FileOutputStream.java:354) > at org.apache.spark.storage.TimeTrackingOutputStream.close( > TimeTrackingOutputStream.java:72) > at java.io.FilterOutputStream.close(FilterOutputStream.java:159) > at net.jpountz.lz4.LZ4BlockOutputStream.close( > LZ4BlockOutputStream.java:178) > at java.io.FilterOutputStream.close(FilterOutputStream.java:159) > at java.io.FilterOutputStream.close(FilterOutputStream.java:159) > at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$ > anon$2.close(UnsafeRowSerializer.scala:96) > at org.apache.spark.storage.DiskBlockObjectWriter$$ > anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils. > scala:1316) > at org.apache.spark.storage.DiskBlockObjectWriter.close( > DiskBlockObjectWriter.scala:107) > at org.apache.spark.storage.DiskBlockObjectWriter. > revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159) > at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter. > stop(BypassMergeSortShuffleWriter.java:234) > at org.apache.spark.scheduler.ShuffleMapTask.runTask( > ShuffleMapTask.scala:85) > at org.apache.spark.scheduler.ShuffleMapTask.runTask( > ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:274) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way > RPC. > java.io.IOException: Failed to connect to /192.168.123.43:58889 > at org.apache.spark.network.client.TransportClientFactory. > createClient(TransportClientFactory.java:228) > at org.apache.spark.network.client.TransportClientFactory. > createClient(TransportClientFactory.java:179) > at org.apache.spark.rpc.netty.NettyRpcEnv.createClient( > NettyRpcEnv.scala:197) > at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox. > scala:191) > at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox. > scala:187) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.ConnectException: Connection refused: / > 192.168.123.43:58889 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:717) > at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect( > NioSocketChannel.java:224) > at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe. > finishConnect(AbstractNioChannel.java:289) > at io.netty.channel.nio.NioEventLoop.processSelectedKey( > NioEventLoop.java:528) > at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized( > NioEventLoop.java:468) > at io.netty.channel.nio.NioEventLoop.processSelectedKeys( > NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at io.netty.util.concurrent.SingleThreadEventExecutor$2. > run(SingleThreadEventExecutor.java:111) > ... 1 more >
Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded
The error message seems self-explanatory, try to figure out what's the disk quota you have for your user. On Wed, Nov 22, 2017 at 8:23 AM, Chetan Khatri wrote: > Anybody reply on this ? > > On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >> >> Hello Spark Users, >> >> I am getting below error, when i am trying to write dataset to parquet >> location. I have enough disk space available. Last time i was facing same >> kind of error which were resolved by increasing number of cores at hyper >> parameters. Currently result set data size is almost 400Gig with below >> hyper parameters >> >> Driver memory: 4g >> Executor Memory: 16g >> Executor cores=12 >> num executors= 8 >> >> Still it's failing, any Idea ? that if i increase executor memory and >> number of executors. it could get resolved ? >> >> >> 17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception >> while reverting partial writes to file /mapr/chetan/local/david.com/t >> mp/hadoop/nm-local-dir/usercache/david-khurana/appcache/ >> application_1509639363072_10572/blockmgr-008604e6-37cb- >> 421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef- >> 404f-9a6a-ded544bb5b3c >> java.io.IOException: Disk quota exceeded >> at java.io.FileOutputStream.close0(Native Method) >> at java.io.FileOutputStream.access$000(FileOutputStream.java:53) >> at java.io.FileOutputStream$1.close(FileOutputStream.java:356) >> at java.io.FileDescriptor.closeAll(FileDescriptor.java:212) >> at java.io.FileOutputStream.close(FileOutputStream.java:354) >> at org.apache.spark.storage.TimeTrackingOutputStream.close(Time >> TrackingOutputStream.java:72) >> at java.io.FilterOutputStream.close(FilterOutputStream.java:159) >> at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStr >> eam.java:178) >> at java.io.FilterOutputStream.close(FilterOutputStream.java:159) >> at java.io.FilterOutputStream.close(FilterOutputStream.java:159) >> at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$ >> anon$2.close(UnsafeRowSerializer.scala:96) >> at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$ >> close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108) >> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala: >> 1316) >> at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlo >> ckObjectWriter.scala:107) >> at org.apache.spark.storage.DiskBlockObjectWriter.revertPartial >> WritesAndClose(DiskBlockObjectWriter.scala:159) >> at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.s >> top(BypassMergeSortShuffleWriter.java:234) >> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap >> Task.scala:85) >> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap >> Task.scala:47) >> at org.apache.spark.scheduler.Task.run(Task.scala:86) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor. >> scala:274) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> 17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way >> RPC. >> java.io.IOException: Failed to connect to /192.168.123.43:58889 >> at org.apache.spark.network.client.TransportClientFactory.creat >> eClient(TransportClientFactory.java:228) >> at org.apache.spark.network.client.TransportClientFactory.creat >> eClient(TransportClientFactory.java:179) >> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpc >> Env.scala:197) >> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala: >> 191) >> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala: >> 187) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.net.ConnectException: Connection refused: / >> 192.168.123.43:58889 >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl >> .java:717) >> at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect >> (NioSocketChannel.java:224) >> at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fi >> nishConnect(AbstractNioChannel.java:289) >> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven >> tLoop.java:528) >> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz >> ed(NioEventLoop.java:468) >> at io.netty.channel.nio.NioEventLoop.processSelecte
Spark.ml roadmap 2.3.0 and beyond
The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available: 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581 .. It seems those roadmaps were not available per se' for 2.3.0 and later? Is there a different mechanism for that info? stephenb
SparkSQL not support CharType
Hi, when I use Dataframe with table schema, It goes wrong: val test_schema = StructType(Array( StructField("id", IntegerType, false), StructField("flag", CharType(1), false), StructField("time", DateType, false))); val df = spark.read.format("com.databricks.spark.csv") .schema(test_schema) .option("header", "false") .option("inferSchema", "false") .option("delimiter", ",") .load("file:///Users/name/b") The log is below: Exception in thread "main" scala.MatchError: CharType(1) (of class org.apache.spark.sql.types.CharType) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.org$apache$spark$sql$catalyst$encoders$RowEncoder$$serializerFor(RowEncoder.scala:73) at org.apache.spark.sql.catalyst.encoders.RowEncoder$$anonfun$2.apply(RowEncoder.scala:158) at org.apache.spark.sql.catalyst.encoders.RowEncoder$$anonfun$2.apply(RowEncoder.scala:157) Why? Is this a bug? But I found spark will translate char type to string when using create table command: create table test(flag char(1)); desc test:flag string; Regards Wendy He