Hi vg I believe the error msg is misleading. I had a similar one with pyspark yesterday after calling a count on a data frame, where the real error was with an incorrect user defined function being applied . Pls send me some sample code with a trimmed down version of the data and I see if i can reproduce Kr
On 23 Jul 2016 4:57 pm, "Pedro Rodriguez" <ski.rodrig...@gmail.com> wrote: Have you changed spark-env.sh or spark-defaults.conf from the default? It looks like spark is trying to address local workers based on a network address (eg 192.168……) instead of on localhost (localhost, 127.0.0.1, 0.0.0.0,…). Additionally, that network address doesn’t resolve correctly. You might also check /etc/hosts to make sure that you don’t have anything weird going on. Last thing to try perhaps is that are you running Spark within a VM and/or Docker? If networking isn’t setup correctly on those you may also run into trouble. What would be helpful is to know everything about your setup that might affect networking. — Pedro Rodriguez PhD Student in Large-Scale Machine Learning | CU Boulder Systems Oriented Data Scientist UC Berkeley AMPLab Alumni pedrorodriguez.io | 909-353-4423 github.com/EntilZha | LinkedIn <https://www.linkedin.com/in/pedrorodriguezscience> On July 23, 2016 at 9:10:31 AM, VG (vlin...@gmail.com) wrote: Hi pedro, Apologies for not adding this earlier. This is running on a local cluster set up as follows. JavaSparkContext jsc = new JavaSparkContext("local[2]", "DR"); Any suggestions based on this ? The ports are not blocked by firewall. Regards, On Sat, Jul 23, 2016 at 8:35 PM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > Make sure that you don’t have ports firewalled. You don’t really give much > information to work from, but it looks like the master can’t access the > worker nodes for some reason. If you give more information on the cluster, > networking, etc, it would help. > > For example, on AWS you can create a security group which allows all > traffic to/from itself to itself. If you are using something like ufw on > ubuntu then you probably need to know the ip addresses of the worker nodes > beforehand. > > — > Pedro Rodriguez > PhD Student in Large-Scale Machine Learning | CU Boulder > Systems Oriented Data Scientist > UC Berkeley AMPLab Alumni > > pedrorodriguez.io | 909-353-4423 > github.com/EntilZha | LinkedIn > <https://www.linkedin.com/in/pedrorodriguezscience> > > On July 23, 2016 at 7:38:01 AM, VG (vlin...@gmail.com) wrote: > > Please suggest if I am doing something wrong or an alternative way of > doing this. > > I have an RDD with two values as follows > JavaPairRDD<String, Long> rdd > > When I execute rdd..collectAsMap() > it always fails with IO exceptions. > > > 16/07/23 19:03:58 ERROR RetryingBlockFetcher: Exception while beginning > fetch of 1 outstanding blocks > java.io.IOException: Failed to connect to /192.168.1.3:58179 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) > at > org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:96) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) > at > org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) > at > org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:105) > at > org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92) > at > org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:546) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:76) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.net.ConnectException: Connection timed out: no further > information: /192.168.1.3:58179 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) > at > io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > ... 1 more > 16/07/23 19:03:58 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 > outstanding blocks after 5000 ms > > > >