date:20150802

Re: unsubscribe

2015-08-02 Thread Akhil Das

LOL Brandon! @ziqiu See http://spark.apache.org/community.html You need to send an email to user-unsubscr...@spark.apache.org Thanks Best Regards On Fri, Jul 31, 2015 at 2:06 AM, Brandon White wrote: > https://www.youtube.com/watch?v=JncgoPKklVE > > On Thu, Jul 30, 2015 at 1:30 PM, wrote:

Re: Does Spark Streaming need to list all the files in a directory?

2015-08-02 Thread Akhil Das

I guess it goes through that 500k files for the first time and then use a filter from next time. Thanks Best Regards On Fri, Jul 31, 2015 at 4:39 AM, Tathagata Das

Re: About memory leak in spark 1.4.1

2015-08-02 Thread Barak Gitsis

Hi, reducing spark.storage.memoryFraction did the trick for me. Heap doesn't get filled because it is reserved.. My reasoning is: I give executor all the memory i can give it, so that makes it a boundary. >From here i try to make the best use of memory I can. storage.memoryFraction is in a sense us

Re: Encryption on RDDs or in-memory/cache on Apache Spark

2015-08-02 Thread Akhil Das

Currently RDDs are not encrypted, I think you can go ahead and open a JIRA to add this feature and may be in future release it could be added. Thanks Best Regards On Fri, Jul 31, 2015 at 1:47 PM, Matthew O'Reilly wrote: > Hi, > > I am currently working on the latest version of Apache Spark (1.4

Re?? About memory leak in spark 1.4.1

2015-08-02 Thread Sea

Hi, Barak It is ok with spark 1.3.0, the problem is with spark 1.4.1. I don't think spark.storage.memoryFraction will make any sense, because it is still in heap memory. -- -- ??: "Barak Gitsis";; : 2015??8??2??(??) 4:11 ???

Re: About memory leak in spark 1.4.1

2015-08-02 Thread Ted Yu

http://spark.apache.org/docs/latest/tuning.html does mention spark.storage.memoryFraction in two places. One is under Cache Size Tuning section. FYI On Sun, Aug 2, 2015 at 2:16 AM, Sea <261810...@qq.com> wrote: > Hi, Barak > It is ok with spark 1.3.0, the problem is with spark 1.4.1. > I

Re: Encryption on RDDs or in-memory/cache on Apache Spark

2015-08-02 Thread Jörn Franke

I think you use case can already be implemented with HDFS encryption and/or SealedObject, if you look for sth like Altibase. If you create a JIRA you may want to set the bar a little bit higher and propose sth like MIT cryptdb: https://css.csail.mit.edu/cryptdb/ Le ven. 31 juil. 2015 à 10:17, Mat

Re?? About memory leak in spark 1.4.1

2015-08-02 Thread Sea

spark.storage.memoryFraction is in heap memory, but my situation is that the memory is more than heap memory ! Anyone else use spark 1.4.1 in production? -- -- ??: "Ted Yu";; : 2015??8??2??(??) 5:45 ??: "Sea"<261810...@qq.co

Re: About memory leak in spark 1.4.1

2015-08-02 Thread Barak Gitsis

spark uses a lot more than heap memory, it is the expected behavior. in 1.4 off-heap memory usage is supposed to grow in comparison to 1.3 Better use as little memory as you can for heap, and since you are not utilizing it already, it is safe for you to reduce it. memoryFraction helps you optimize

spark no output

2015-08-02 Thread Pa Rö

hi community, i have run my k-means spark application on 1million data points. the program works, but no output in the hdfs is generated. when it runs on 10.000 points, a output is written. maybe someone has an idea? best regards, paul

Re: spark no output

2015-08-02 Thread Ted Yu

Can you provide some more detai: release of Spark you're using were you running in standalone or YARN cluster mode have you checked driver log ? Cheers On Sun, Aug 2, 2015 at 7:04 AM, Pa Rö wrote: > hi community, > > i have run my k-means spark application on 1million data points. the > progra

Re: spark no output

2015-08-02 Thread Connor Zanin

I agree with Ted. Could you please post the log file? On Aug 2, 2015 10:13 AM, "Ted Yu" wrote: > Can you provide some more detai: > > release of Spark you're using > were you running in standalone or YARN cluster mode > have you checked driver log ? > > Cheers > > On Sun, Aug 2, 2015 at 7:04 AM,

Re: TCP/IP speedup

2015-08-02 Thread Michael Segel

This may seem like a silly question… but in following Mark’s link, the presentation talks about the TPC-DS benchmark. Here’s my question… what benchmark results? If you go over to the TPC.org website they have no TPC-DS benchmarks listed. (Either audited or unaudited) So

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Sujit Pal

No one has any ideas? Is there some more information I should provide? I am looking for ways to increase the parallelism among workers. Currently I just see number of simultaneous connections to Solr equal to the number of workers. My number of partitions is (2.5x) larger than number of workers,

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Igor Berman

What kind of cluster? How many cores on each worker? Is there config for http solr client? I remember standard httpclient has limit per route/host. On Aug 2, 2015 8:17 PM, "Sujit Pal" wrote: > No one has any ideas? > > Is there some more information I should provide? > > I am looking for ways to

how to ignore MatchError then processing a large json file in spark-sql

2015-08-02 Thread fuellee lee

I'm trying to process a bunch of large json log files with spark, but it fails every time with `scala.MatchError`, Whether I give it schema or not. I just want to skip lines that does not match schema, but I can't find how in docs of spark. I know write a json parser and map it to json file RDD c

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Sujit Pal

Hi Igor, The cluster is a Databricks Spark cluster. It consists of 1 master + 4 workers, each worker has 60GB RAM and 4 CPUs. The original mail has some more details (also the reference to the HttpSolrClient in there should be HttpSolrServer, sorry about that, mistake while writing the email). Th

RE: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Silvio Fiorito

Can you share the transformations up to the foreachPartition? From: Sujit Pal Sent: ‎8/‎2/‎2015 4:42 PM To: Igor Berman Cc: user Subject: Re: How to increase parallelism of a

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Igor Berman

so how many cores you configure per node? do u have something like total-executor-cores or maybe --num-executors config(I'm not sure what kind of cluster databricks platform provides, if it's standalone then first option should be used)? if you have 4 cores at total, then even though you have 4

Re: TCP/IP speedup

2015-08-02 Thread Steve Loughran

On 1 Aug 2015, at 18:26, Ruslan Dautkhanov mailto:dautkha...@gmail.com>> wrote: If your network is bandwidth-bound, you'll see setting jumbo frames (MTU 9000) may increase bandwidth up to ~20%. http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm "Enabling Jum

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Steve Loughran

On 2 Aug 2015, at 13:42, Sujit Pal mailto:sujitatgt...@gmail.com>> wrote: There is no additional configuration on the external Solr host from my code, I am using the default HttpClient provided by HttpSolrServer. According to the Javadocs, you can pass in a HttpClient object as well. Is there

Re: How to increase parallelism of a Spark cluster?

2015-08-02 Thread Abhishek R. Singh

I don't know if (your assertion/expectation that) workers will process things (multiple partitions) in parallel is really valid. Or if having more partitions than workers will necessarily help (unless you are memory bound - so partitions is essentially helping your work size rather than executio

Re?? About memory leak in spark 1.4.1

2015-08-02 Thread Sea

"spark uses a lot more than heap memory, it is the expected behavior." It didn't exist in spark 1.3.x What does "a lot more than" means? It means that I lose control of it! I try to apply 31g, but it still grows to 55g and continues to grow!!! That is the point! I have tried set memoryFraction

Re: spark cluster setup

2015-08-02 Thread Sonal Goyal

What do the master logs show? Best Regards, Sonal Founder, Nube Technologies Check out

Cannot Import Package (spark-csv)

2015-08-02 Thread billchambers

I am trying to import the spark csv package while using the scala spark shell. Spark 1.4.1, Scala 2.11 I am starting the shell with: bin/spark-shell --packages com.databricks:spark-csv_2.11:1.1.0 --jars ../sjars/spark-csv_2.11-1.1.0.jar --master local I then try and run and get the following

Re: Cannot Import Package (spark-csv)

2015-08-02 Thread Ted Yu

The command you ran and the error you got were not visible. Mind sending them again ? Cheers On Sun, Aug 2, 2015 at 8:33 PM, billchambers wrote: > I am trying to import the spark csv package while using the scala spark > shell. Spark 1.4.1, Scala 2.11 > > I am starting the shell with: > > bin/

Re: Cannot Import Package (spark-csv)

2015-08-02 Thread billchambers

Commands again are: Sure the commands are: scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv") and get the following error: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv at scala.sys.package

Re: Cannot Import Package (spark-csv)

2015-08-02 Thread Ted Yu

I tried the following command on master branch: bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --jars ../spark-csv_2.10-1.0.3.jar --master local I didn't reproduce the error with your command. FYI On Sun, Aug 2, 2015 at 8:57 PM, Bill Chambers < wchamb...@ischool.berkeley.edu> wro

Checkpoint file not found

2015-08-02 Thread Anand Nalya

Hi, I'm writing a Streaming application in Spark 1.3. After running for some time, I'm getting following execption. I'm sure, that no other process is modifying the hdfs file. Any idea, what might be the cause of this? 15/08/02 21:24:13 ERROR scheduler.DAGSchedulerEventProcessLoop: DAGSchedulerEv

Extremely poor predictive performance with RF in mllib

2015-08-02 Thread pkphlam

Hi, This might be a long shot, but has anybody run into very poor predictive performance using RandomForest with Mllib? Here is what I'm doing: - Spark 1.4.1 with PySpark - Python 3.4.2 - ~30,000 Tweets of text - 12289 1s and 15956 0s - Whitespace tokenization and then hashing trick for feature s

Re: spark cluster setup

2015-08-02 Thread Sonal Goyal

Your master log files will be on the spark home folder/logs at the master machine. Do they show an error ? Best Regards, Sonal Founder, Nube Technologies Check out Reifier at Spark Summit 2015

Re: unsubscribe

Re: Does Spark Streaming need to list all the files in a directory?

Re: About memory leak in spark 1.4.1

Re: Encryption on RDDs or in-memory/cache on Apache Spark

Re?? About memory leak in spark 1.4.1

Re: About memory leak in spark 1.4.1

Re: Encryption on RDDs or in-memory/cache on Apache Spark

Re?? About memory leak in spark 1.4.1

Re: About memory leak in spark 1.4.1

spark no output

Re: spark no output

Re: spark no output

Re: TCP/IP speedup

Re: How to increase parallelism of a Spark cluster?

Re: How to increase parallelism of a Spark cluster?

how to ignore MatchError then processing a large json file in spark-sql

Re: How to increase parallelism of a Spark cluster?

RE: How to increase parallelism of a Spark cluster?

Re: How to increase parallelism of a Spark cluster?

Re: TCP/IP speedup

Re: How to increase parallelism of a Spark cluster?

Re: How to increase parallelism of a Spark cluster?

Re?? About memory leak in spark 1.4.1

Re: spark cluster setup

Cannot Import Package (spark-csv)

Re: Cannot Import Package (spark-csv)

Re: Cannot Import Package (spark-csv)

Re: Cannot Import Package (spark-csv)

Checkpoint file not found

Extremely poor predictive performance with RF in mllib

Re: spark cluster setup

31 matches

Site Navigation

Mail list logo

Footer information