Re: Spark DataFrame UNPIVOT feature

2018-08-22 Thread Mike Hynes
so would other concepts from the pandas API, such as named indexing & multilevel indexing). Cheers, Mike On Tue, Aug 21, 2018, 5:07 PM Reynold Xin, wrote: > Probably just because it is not used that often and nobody has submitted a > patch for it. I've used pivot probably on ave

Re: Performance regression for partitioned parquet data

2017-06-14 Thread Mike Wheeler
same issue. thanks, Mike On Tue, Jun 13, 2017 at 10:05 AM, Michael Allman wrote: > Hi Bertrand, > > I encourage you to create a ticket for this and submit a PR if you have > time. Please add me as a listener, and I'll try to contribute/review. > > Michael > > On

Re: Running Spark master/slave instances in non Daemon mode

2016-09-29 Thread Mike Ihbe
zing is causing us minor hardship and seems like an easy thing to make optional. We'd be happy to make the PR as well. --Mike On Thu, Sep 29, 2016 at 5:25 PM, Jakob Odersky wrote: > I'm curious, what kind of container solutions require foreground > processes? Most init

Re: RDD.broadcast

2016-04-28 Thread Mike Hynes
I second knowing the use case for interest. I can imagine a case where knowledge of the RDD key distribution would help local computations, for relaticely few keys, but would be interested to hear your motive. Essentially, are you trying to achieve what would be an all-reduce type operation in MPI

Re: executor delay in Spark

2016-04-24 Thread Mike Hynes
, but if not and your executors all receive at least *some* partitions, then I still wouldn't rule out effects of scheduling delay. It's a simple test, but it could give some insight. Mike his could still be a scheduling If only one has *all* partitions, and email me the log file?

Re: executor delay in Spark

2016-04-22 Thread Mike Hynes
caused by unusual initial task scheduling. I don't know of ways to avoid this other than creating a dummy task to synchronize the executors, but hopefully someone from there can suggest other possibilities. Mike On Apr 23, 2016 5:53 AM, "Raghava Mutharaju" wrote: > Mike, &

Re: RDD Partitions not distributed evenly to executors

2016-04-06 Thread Mike Hynes
to half the number of partitions with the shuffle flag set to true. Would that be reasonable? Thank you very much for your time, and I very much hope that someone from the dev community who is familiar with the scheduler may be able to clarify the above observations and questions. Thanks, Mike P.

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Mike Hynes
f anyone else has any other ideas or experience, please let me know. Mike On 4/4/16, Koert Kuipers wrote: > we ran into similar issues and it seems related to the new memory > management. can you try: > spark.memory.useLegacyMode = true > > On Mon, Apr 4, 2016 at 9:12 AM,

RDD Partitions not distributed evenly to executors

2016-04-04 Thread Mike Hynes
blem? Please let me know if others in the community have observed this, and thank you for your time, Mike - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: sbt publish-local fails with 2.0.0-SNAPSHOT

2016-02-01 Thread Mike Hynes
Thank you Saisai for the JIRA/PR; I'm glad to see it is a one-line fix, and will try this locally in the interim. Mike On 2/1/16, Saisai Shao wrote: > I think it is due to our recent changes to override the external resolvers > in sbt building profile, I just created a JIR

sbt publish-local fails with 2.0.0-SNAPSHOT

2016-01-31 Thread Mike Hynes
iled [error] (streaming-mqtt/*:publishLocal) Undefined resolver 'local' [error] (mllib/*:publishLocal) Undefined resolver 'local' [error] (examples/*:publishLocal) Undefined resolver 'local' [error] (streaming-flume-assembly/*:publishLocal) Undefined resolver 'local

Re: Gradient Descent with large model size

2015-10-19 Thread Mike Hynes
e job across the cluster it's very noticeable. If there's to be any modifications of treeAggregate, I would recommend some heuristics that uses numLevels = log_2(numNodes) or something similar, or have the numLevels be specifiable in the MLlib APIs instead of defaulting to 2. Mike On 1

Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Mike Hynes
t;label") > val evaluator = new > MulticlassClassificationEvaluator().setMetricName("precision") > println("Precision:" + evaluator.evaluate(predictionAndLabels)) > > Can you please suggest me how can I ensure that the data/task is divided > equally to all

Re: treeAggregate timing / SGD performance with miniBatchFraction < 1

2015-09-26 Thread Mike Hynes
the > last portion this could really make a difference. > > On Sat, Sep 26, 2015 at 10:20 AM, Mike Hynes <91m...@gmail.com> wrote: > >> Hi Evan, >> >> (I just realized my initial email was a reply to the wrong thread; I'm >> very sorry about this). &

treeAggregate timing / SGD performance with miniBatchFraction < 1

2015-09-26 Thread Mike Hynes
very level. Furthermore, the driver is receiving the result of only 4 tasks, which is relatively small. Mike On 9/26/15, Evan R. Sparks wrote: > Mike, > > I believe the reason you're seeing near identical performance on the > gradient computations is twofold > 1) Gradient c

Re: RDD API patterns

2015-09-26 Thread Mike Hynes
nce working with the sampling in minibatch SGD or has tested the scalability of the treeAggregation operation for vectors, I'd really appreciate your thoughts. Thanks, Mike gradient_f1.pdf Description: Adobe PDF document gradient_f-3.pdf Description: Adobe PDF

Re: OOM in spark driver

2015-09-02 Thread Mike Hynes
Just a thought; this has worked for me before on standalone client with a similar OOM error in a driver thread. Try setting: export SPARK_DAEMON_MEMORY=4G #or whatever size you can afford on your machine in your environment/spark-env.sh before running spark-submit. Mike On 9/2/15, ankit tyagi

Re: Have Friedman's glmnet algo running in Spark

2015-08-04 Thread mike
way or another, that's always required to get a final solution. It's just a question of whether the points on the path are generated by hunting and pecking or done all in one shot systematically. mike -Original Message- From: Patrick [mailto:petz2...@gmail.com] Sent: Tuesday, A

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-29 Thread Mike Hynes
Hi Imran, Thanks to you and Shivaram for looking into this, and opening the JIRA/PR. I will update you once the PR is merged if there are any other problems that arise from the broadcast. Mike On 7/29/15, Imran Rashid wrote: > Hi Mike, > > I dug into this a little more, and it turns ou

Re: Broadcast variable of size 1 GB fails with negative memory exception

2015-07-28 Thread Mike Hynes
^31 physical bytes being transferred, I am guessing that there is still a physical limitation on how many bytes may be sent via broadcasting, at least for a primitive Array[Double]? Thanks, Mike 19176&INFO&IndexedRowMatrix&Broadcasting vecArray with size 268435456& 19177&INFO&am

Broadcast variable of size 1 GB fails with negative memory exception

2015-07-28 Thread Mike Hynes
the broadcast. The problem stems from the size of the result block to be sent in BlockInfo.scala; the size is reportedly negative. An example error log is shown below. If anyone has more experience or knowledge of why this broadcast is failing, I'd appreciate the input. -- T

Re: Questions about Fault tolerance of Spark

2015-07-10 Thread MIKE HYNES
Gentle bump on this topic; how to test the fault tolerance and previous benchmark results are both things we are interested in as well.  Mike Original message From: 牛兆捷 Date:07-09-2015 04:19 (GMT-05:00) To: dev@spark.apache.org, u...@spark.apache.org Subject: Questions

Re: Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-10 Thread Mike Hynes
under-utilization and poor weak scaling efficiency. I will cc this thread over to the dev list. I did not cc them in case my previous question was trivial---I didn't want to spam the list unnecessarily, since I do not see these kinds of questions posed there frequently. Thanks a bunch, Mike

Re: Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-09 Thread Mike Hynes
Ahhh---forgive my typo: what I mean is, (t2 - t1) >= (t_ser + t_deser + t_exec) is satisfied, empirically. On 6/10/15, Mike Hynes <91m...@gmail.com> wrote: > Hi Imran, > > Thank you for your email. > > In examing the condition (t2 - t1) < (t_ser + t_deser + t_exec), I

Re: Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-09 Thread Mike Hynes
nks, Mike On 6/8/15, Imran Rashid wrote: > Hi Mike, > > all good questions, let me take a stab at answering them: > > 1. Event Logs + Stages: > > Its normal for stages to get skipped if they are shuffle map stages, which > get read multiple times. Eg., here's a little exam

Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-07 Thread Mike Hynes
e occasionally reported measurements for Shuffle Write time, but not shuffle read time. Is there a method to determine the time required to shuffle data? Could this be done by look at delays between the first task in a new stage and the last task in the previous stage? Thank you very much for your tim

Scheduler question: stages with non-arithmetic numbering

2015-06-05 Thread Mike Hynes
ny stage's parent List(Stage x, Stage y, ...) Thanks, Mike On 6/1/15, Reynold Xin wrote: > Thanks, René. I actually added a warning to the new JDBC reader/writer > interface for 1.4.0. > > Even with that, I think we should support throttling JDBC; otherwise it's > too co

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Mike Ringenburg
The Configuration link on the docs appears to be broken. Mike On May 29, 2015, at 4:41 PM, Patrick Wendell mailto:pwend...@gmail.com>> wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https

Re: Spark config option 'expression language' feedback request

2015-03-31 Thread Mike Hynes
Hi, This is just a thought from my experience setting up Spark to run on a linux cluster. I found it a bit unusual that some parameters could be specified as command line args to spark-submit, others as env variables, and some in a configuration file. What I ended up doing was writing my own bash s

Re: Have Friedman's glmnet algo running in Spark

2015-02-25 Thread mike
ing we can hand out. We've delayed putting together a release version in favor of generating some scaling results, as Joseph suggested. Discussions like this may have some impact on what the release code looks like. Mike -Original Message--- From: Debasish Das [mailto:debasish.da...@g

Re: [ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread Mike Hynes
ar command show? are you > sure you don't have JRE 7 but JDK 6 installed? > > On Tue, Feb 24, 2015 at 11:02 PM, Mike Hynes <91m...@gmail.com> wrote: >> ./bin/compute-classpath.sh fails with error: >> >> $> jar -tf >> assembly/target/scala-2.10/spar

[ERROR] bin/compute-classpath.sh: fails with false positive test for java 1.7 vs 1.6

2015-02-24 Thread Mike Hynes
mpute-classpath.sh, the scripts start-{master,slaves,...}.sh all run fine, and I have no problem launching applications. Could someone please offer some insight into this issue? Thanks, Mike - To unsubscribe, e-mail: dev-unsubscr.

Re: Have Friedman's glmnet algo running in Spark

2015-02-24 Thread mike
r of columns. Thanks for your help. Mike -Original Message- From: Joseph Bradley [mailto:jos...@databricks.com] Sent: Sunday, February 22, 2015 06:48 PM To: m...@mbowles.com Cc: dev@spark.apache.org Subject: Re: Have Friedman's glmnet algo running in Spark Hi Mike,glmnet has definitel

Have Friedman's glmnet algo running in Spark

2015-02-19 Thread mike
We're eager to make the code available as open source and would like to get some feedback about how best to do that. Any thoughts? Mike Bowles.