You can use the following command to build Spark after applying the pull request:
mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package Cheers On Sun, Jun 28, 2015 at 11:43 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]> wrote: > I see that block support did not make it to spark 1.4 release. > > Can you share instructions of building spark with this support for hadoop > 2.4.x distribution. > > appreciate. > > On Fri, Jun 26, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]> > wrote: > >> This is nice. Which version of Spark has this support ? Or do I need to >> build it. >> I have never built Spark from git, please share instructions for Hadoop >> 2.4.x YARN. >> >> I am struggling a lot to get a join work between 200G and 2TB datasets. I >> am constantly getting this exception >> >> 1000s of executors are failing with >> >> 15/06/26 13:05:28 ERROR storage.ShuffleBlockFetcherIterator: Failed to >> get block(s) from phxdpehdc9dn2125.stratus.phx.ebay.com:60162 >> java.io.IOException: Failed to connect to >> executor_host_name/executor_ip_address:60162 >> at >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) >> at >> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) >> at >> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) >> at >> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> >> On Fri, Jun 26, 2015 at 3:20 PM, Koert Kuipers <[email protected]> wrote: >> >>> we went through a similar process, switching from scalding (where >>> everything just works on large datasets) to spark (where it does not). >>> >>> spark can be made to work on very large datasets, it just requires a >>> little more effort. pay attention to your storage levels (should be >>> memory-and-disk or disk-only), number of partitions (should be large, >>> multiple of num executors), and avoid groupByKey >>> >>> also see: >>> https://github.com/tresata/spark-sorted (for avoiding in memory >>> operations for certain type of reduce operations) >>> https://github.com/apache/spark/pull/6883 (for blockjoin) >>> >>> >>> On Fri, Jun 26, 2015 at 5:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]> >>> wrote: >>> >>>> Not far at all. On large data sets everything simply fails with Spark. >>>> Worst is am not able to figure out the reason of failure, the logs run >>>> into millions of lines and i do not know the keywords to search for failure >>>> reason >>>> >>>> On Mon, Jun 15, 2015 at 6:52 AM, Night Wolf <[email protected]> >>>> wrote: >>>> >>>>> How far did you get? >>>>> >>>>> On Tue, Jun 2, 2015 at 4:02 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]> >>>>> wrote: >>>>> >>>>>> We use Scoobi + MR to perform joins and we particularly use >>>>>> blockJoin() API of scoobi >>>>>> >>>>>> >>>>>> /** Perform an equijoin with another distributed list where this list >>>>>> is considerably smaller >>>>>> * than the right (but too large to fit in memory), and where the keys >>>>>> of right may be >>>>>> * particularly skewed. */ >>>>>> >>>>>> def blockJoin[B : WireFormat](right: DList[(K, B)]): DList[(K, (A, >>>>>> B))] = >>>>>> Relational.blockJoin(left, right) >>>>>> >>>>>> >>>>>> I am trying to do a POC and what Spark join API(s) is recommended to >>>>>> achieve something similar ? >>>>>> >>>>>> Please suggest. >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >> >> >> -- >> Deepak >> >> > > > -- > Deepak > >
