Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-22 Thread Chetan Khatri
Anybody reply on this ?

On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri 
wrote:

>
> Hello Spark Users,
>
> I am getting below error, when i am trying to write dataset to parquet
> location. I have enough disk space available. Last time i was facing same
> kind of error which were resolved by increasing number of cores at hyper
> parameters. Currently result set data size is almost 400Gig with below
> hyper parameters
>
> Driver memory: 4g
> Executor Memory: 16g
> Executor cores=12
> num executors= 8
>
> Still it's failing, any Idea ? that if i increase executor memory and
> number of executors.  it could get resolved ?
>
>
> 17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception
> while reverting partial writes to file /mapr/chetan/local/david.com/
> tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_
> 1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5-
> e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c
> java.io.IOException: Disk quota exceeded
> at java.io.FileOutputStream.close0(Native Method)
> at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
> at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
> at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
> at java.io.FileOutputStream.close(FileOutputStream.java:354)
> at org.apache.spark.storage.TimeTrackingOutputStream.close(
> TimeTrackingOutputStream.java:72)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> at net.jpountz.lz4.LZ4BlockOutputStream.close(
> LZ4BlockOutputStream.java:178)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$
> anon$2.close(UnsafeRowSerializer.scala:96)
> at org.apache.spark.storage.DiskBlockObjectWriter$$
> anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.
> scala:1316)
> at org.apache.spark.storage.DiskBlockObjectWriter.close(
> DiskBlockObjectWriter.scala:107)
> at org.apache.spark.storage.DiskBlockObjectWriter.
> revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159)
> at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.
> stop(BypassMergeSortShuffleWriter.java:234)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:85)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:274)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way
> RPC.
> java.io.IOException: Failed to connect to /192.168.123.43:58889
> at org.apache.spark.network.client.TransportClientFactory.
> createClient(TransportClientFactory.java:228)
> at org.apache.spark.network.client.TransportClientFactory.
> createClient(TransportClientFactory.java:179)
> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(
> NettyRpcEnv.scala:197)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.
> scala:191)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.
> scala:187)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: /
> 192.168.123.43:58889
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:717)
> at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(
> NioSocketChannel.java:224)
> at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.
> finishConnect(AbstractNioChannel.java:289)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:528)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:468)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:111)
>   ... 1 more
>


Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-22 Thread Vadim Semenov
The error message seems self-explanatory, try to figure out what's the disk
quota you have for your user.

On Wed, Nov 22, 2017 at 8:23 AM, Chetan Khatri 
wrote:

> Anybody reply on this ?
>
> On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
>>
>> Hello Spark Users,
>>
>> I am getting below error, when i am trying to write dataset to parquet
>> location. I have enough disk space available. Last time i was facing same
>> kind of error which were resolved by increasing number of cores at hyper
>> parameters. Currently result set data size is almost 400Gig with below
>> hyper parameters
>>
>> Driver memory: 4g
>> Executor Memory: 16g
>> Executor cores=12
>> num executors= 8
>>
>> Still it's failing, any Idea ? that if i increase executor memory and
>> number of executors.  it could get resolved ?
>>
>>
>> 17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception
>> while reverting partial writes to file /mapr/chetan/local/david.com/t
>> mp/hadoop/nm-local-dir/usercache/david-khurana/appcache/
>> application_1509639363072_10572/blockmgr-008604e6-37cb-
>> 421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef-
>> 404f-9a6a-ded544bb5b3c
>> java.io.IOException: Disk quota exceeded
>> at java.io.FileOutputStream.close0(Native Method)
>> at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
>> at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
>> at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
>> at java.io.FileOutputStream.close(FileOutputStream.java:354)
>> at org.apache.spark.storage.TimeTrackingOutputStream.close(Time
>> TrackingOutputStream.java:72)
>> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>> at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStr
>> eam.java:178)
>> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>> at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$
>> anon$2.close(UnsafeRowSerializer.scala:96)
>> at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$
>> close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
>> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:
>> 1316)
>> at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlo
>> ckObjectWriter.scala:107)
>> at org.apache.spark.storage.DiskBlockObjectWriter.revertPartial
>> WritesAndClose(DiskBlockObjectWriter.scala:159)
>> at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.s
>> top(BypassMergeSortShuffleWriter.java:234)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
>> Task.scala:85)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
>> Task.scala:47)
>> at org.apache.spark.scheduler.Task.run(Task.scala:86)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.
>> scala:274)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way
>> RPC.
>> java.io.IOException: Failed to connect to /192.168.123.43:58889
>> at org.apache.spark.network.client.TransportClientFactory.creat
>> eClient(TransportClientFactory.java:228)
>> at org.apache.spark.network.client.TransportClientFactory.creat
>> eClient(TransportClientFactory.java:179)
>> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpc
>> Env.scala:197)
>> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:
>> 191)
>> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:
>> 187)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.net.ConnectException: Connection refused: /
>> 192.168.123.43:58889
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl
>> .java:717)
>> at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect
>> (NioSocketChannel.java:224)
>> at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fi
>> nishConnect(AbstractNioChannel.java:289)
>> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>> tLoop.java:528)
>> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>> ed(NioEventLoop.java:468)
>> at io.netty.channel.nio.NioEventLoop.processSelecte

Spark.ml roadmap 2.3.0 and beyond

2017-11-22 Thread Stephen Boesch
The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:

2.2.0 https://issues.apache.org/jira/browse/SPARK-18813

2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
..

It seems those roadmaps were not available per se' for 2.3.0 and later? Is
there a different mechanism for that info?

stephenb


SparkSQL not support CharType

2017-11-22 Thread 163
Hi,
 when I use Dataframe with table schema, It goes wrong:

val test_schema = StructType(Array(
  StructField("id", IntegerType, false),
  StructField("flag", CharType(1), false),
  StructField("time", DateType, false)));

val df = spark.read.format("com.databricks.spark.csv")
  .schema(test_schema)
  .option("header", "false")
  .option("inferSchema", "false")
  .option("delimiter", ",")
  .load("file:///Users/name/b")

The log is below:
Exception in thread "main" scala.MatchError: CharType(1) (of class 
org.apache.spark.sql.types.CharType)
at 
org.apache.spark.sql.catalyst.encoders.RowEncoder$.org$apache$spark$sql$catalyst$encoders$RowEncoder$$serializerFor(RowEncoder.scala:73)
at 
org.apache.spark.sql.catalyst.encoders.RowEncoder$$anonfun$2.apply(RowEncoder.scala:158)
at 
org.apache.spark.sql.catalyst.encoders.RowEncoder$$anonfun$2.apply(RowEncoder.scala:157)

Why? Is this a bug?

But I found spark will translate char type to string when using create 
table command:

 create table test(flag char(1));
desc test:flag string;




Regards
Wendy He