from:"Ewan Higgs"

Re: Multi-Line JSON in SparkSQL

2015-05-05 Thread Ewan Higgs

FWIW, CSV has the same problem that renders it immune to naive partitioning. Consider the following RFC 4180 compliant record: 1,2," all,of,these,are,just,one,field ",4,5 Now, it's probably a terrible idea to give a file system awareness of actual file types, but couldn't HDFS handle this near

Re: Tungsten + Flink

2015-05-01 Thread Ewan Higgs

both Flink and Spark into one.This eases the industry adaptation instead. Thanking you. With Regards Sree On Wednesday, April 29, 2015 3:21 AM, Ewan Higgs wrote: Hi all, A quick question about Tungsten. The announcement of the Tungsten project is on the back of Hadoop Summit in Brussels whe

Tungsten + Flink

2015-04-29 Thread Ewan Higgs

Hi all, A quick question about Tungsten. The announcement of the Tungsten project is on the back of Hadoop Summit in Brussels where some of the Flink devs were giving talks [1] on how Flink manages memory using byte arrays and the like to avoid the overhead of all the Java types[2]. Is there a

Fwd: SparkSpark-perf terasort WIP branch

2015-03-06 Thread Ewan Higgs

WIP branch Date: Wed, 14 Jan 2015 14:33:45 +0100 From: Ewan Higgs To: dev@spark.apache.org Hi all, I'm trying to build the Spark-perf WIP code but there are some errors to do with Hadoop APIs. I presume this is because there is some Hadoop version set and it's referring to t

Re: Replacing Jetty with TomCat

2015-02-19 Thread Ewan Higgs

To add to Sean and Reynold's point: Please correct me if I'm wrong, but Spark depends on hadoop-common which also uses jetty in the HttpServer2 code. So even if you remove jetty from Spark by making it an optional dependency, it will be pulled in by Hadoop. So you'll still see that your prog

Re: Performance test for sort shuffle

2015-02-02 Thread Ewan Higgs

ing it there[1]. I put it on the back burner until someone can get back to me on it. Yours, Ewan Higgs [1] http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSpark-perf-terasort-WIP-branch-tt10105.html On 02/02/15 23:26, Kannan Rajah wrote: Is there a recommended performance test

Re: Custom Cluster Managers / Standalone Recovery Mode in Spark

2015-02-01 Thread Ewan Higgs

nd [2]. Then we should be able to get slurm, pbs, and sge in one shot rather than implementing some wire formats for RPC. Thanks, Ewan Higgs [1] https://hadoop.apache.org/docs/r1.2.1/hod_scheduler.html https://github.com/glennklockwood/hpchadoop http://jaliyacgl.blogspot.be/2008/08/hadoop-as-batc

Re: RDD order guarantees

2015-01-19 Thread Ewan Higgs

ystem implementation that overrides the listStatus method, and then in Hadoop Conf set the fs.file.impl to that. Shouldn't be too hard. Would you be interested in working on it? On Fri, Jan 16, 2015 at 3:36 PM, Ewan Higgs <mailto:ewan.hi...@ugent.be>> wrote: Yes, I am running on

Re: RDD order guarantees

2015-01-16 Thread Ewan Higgs

local file system right? HDFS orders the file based on names, but local file system often don't. I think that's why the difference. We might be able to do a sort and order the partitions when we create a RDD to make this universal though. On Fri, Jan 16, 2015 at 8:26 AM,

RDD order guarantees

2015-01-16 Thread Ewan Higgs

Hi all, Quick one: when reading files, are the orders of partitions guaranteed to be preserved? I am finding some weird behaviour where I run sortByKeys() on an RDD (which has 16 byte keys) and write it to disk. If I open a python shell and run the following: for part in range(29): print

SparkSpark-perf terasort WIP branch

2015-01-14 Thread Ewan Higgs

Hi all, I'm trying to build the Spark-perf WIP code but there are some errors to do with Hadoop APIs. I presume this is because there is some Hadoop version set and it's referring to that. But I can't seem to find it. The errors are as follows: [info] Compiling 15 Scala sources and 2 Java sou

Re: running the Terasort example

2014-12-16 Thread Ewan Higgs

not be functioning appropriately. If you have trouble with it, I recommend using the Hadoop version. Yours, Ewan > Thanks, > Tim > > > On 12/16/14, 12:38 AM, "Ewan Higgs" wrote: > >> Hi Tim, >> run-example is here: >> https://github.com/ehiggs/spa

Re: running the Terasort example

2014-12-16 Thread Ewan Higgs

Hi Tim, run-example is here: https://github.com/ehiggs/spark/blob/terasort/bin/run-example It should be in the repository that you cloned. So if you were at the top level of the checkout, run-example would be run as ./bin/run-example. Yours, Ewan Higgs On 12/12/14 01:06, Tim Harsch wrote

Re: Terasort example

2014-11-11 Thread Ewan Higgs

great. I think the consensus from last time was that we would put performance stuff into spark-perf, so it is easy to test different Spark versions. On Tue, Nov 11, 2014 at 5:03 AM, Ewan Higgs <mailto:ewan.hi...@ugent.be>> wrote: Hi all, I saw that Reynold Xin had a Terasort e

Terasort example

2014-11-11 Thread Ewan Higgs

helped me get through learning some rudimentary Scala to get this far. Yours, Ewan Higgs [1] https://github.com/apache/spark/pull/1242 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: Multi-Line JSON in SparkSQL

Re: Tungsten + Flink

Tungsten + Flink

Fwd: SparkSpark-perf terasort WIP branch

Re: Replacing Jetty with TomCat

Re: Performance test for sort shuffle

Re: Custom Cluster Managers / Standalone Recovery Mode in Spark

Re: RDD order guarantees

Re: RDD order guarantees

RDD order guarantees

SparkSpark-perf terasort WIP branch

Re: running the Terasort example

Re: running the Terasort example

Re: Terasort example

Terasort example

15 matches

Site Navigation

Mail list logo

Footer information