The test accuracy doesn't mean the total loss. All points between (-1,
1) can separate points -1 and +1 and give you 1.0 accuracy, but their
coressponding loss are different. -Xiangrui
On Sun, Sep 28, 2014 at 2:48 AM, Yanbo Liang wrote:
> Hi
>
> We have used LogisticRegression with two different
Can you check the loss of both LBFGS and SGD implementation? One
reason maybe SGD doesn't converge well and you can see that by
comparing both log-likelihoods. One other potential reason maybe the
label of your training data is totally separable, so you can always
increase the log-likelihood by mul
Thank you for all your patient response.
I can conclude that if the data is totally separable or over-fit occurs,
weights may be different.
And it also consistent with my experiment.
I have evaluate two different dataset and the result as followed:
Loss function: LogisticGradient
Regularizer: L2
Our cluster is a standalone cluster with 16 computing nodes, each node has 16
cores. I set SPARK_WORKER_INSTANCES to 1, and set SPARK_WORKER_CORES to 32,
we give 512 tasks all together, this situation can help increase the
concurrency. But if I set SPARK_WORKER_INSTANCES to 2, SPARK_WORKER_CORES
t
Hi, myasuka
Have you checked the jvm gc time of each executor?
I think you should increase the SPARK_EXECUTOR_CORES or
SPARK_EXECUTOR_INSTANCES until you get the enough concurrency.
Here is my recommend config:
SPARK_EXECUTOR_CORES=8
SPARK_EXECUTOR_INSTANCES=4
SPARK_WORKER_MEMORY=8G
note: ma
Hi,
Running test suite in trunk, I got:
^[[32mBasicOperationsSuite:^[[0m
^[[32m- map^[[0m
^[[32m- flatMap^[[0m
^[[32m- filter^[[0m
^[[32m- glom^[[0m
^[[32m- mapPartitions^[[0m
^[[32m- repartition (more partitions)^[[0m
^[[32m- repartition (fewer partitions)^[[0m
^[[32m- groupByKey^[[0m
^[[32m- red
happy monday, everyone!
remember a few weeks back when i upgraded jenkins, and unwittingly began
DOSing our system due to massive log spam?
well, that bug has been fixed w/the current release and i'd like to get our
logging levels back to something more verbose that we have now.
downtime will be
we were running at 8 executors per node, and BARELY even stressing the
machines (32 cores, ~230G RAM).
in the interest of actually using system resources, and giving ourselves
some headroom, i upped the executors to 16 per node. i'll be keeping an
eye on ganglia for the rest of the week to make s
Thanks. We might see more failures due to contention on resources. Fingers
acrossed ... At some point it might make sense to run the tests in a VM or
container.
On Mon, Sep 29, 2014 at 2:20 PM, shane knapp wrote:
> we were running at 8 executors per node, and BARELY even stressing the
> machine
yeah, this is why i'm gonna keep a close eye on things this week...
as for VMs vs containers, please do the latter more than the former. one
of our longer-term plans here at the lab is to move most of our jenkins
infra to VMs, and running tests w/nested VMs is Bad[tm].
On Mon, Sep 29, 2014 at 2:
Hi,
Is there anyone who works on hyper parameter optimization algorithms? If
not, is there any interest on the subject. We are thinking about
implementing some of these algorithms and contributing to spark? thoughts?
Lochana
---
Just noticed these lines in the jenkins log
=
Running Apache RAT checks
=
Attempting to fetch rat Launching rat from
/home/jenkins/workspace/SparkPul
You should look into Evan Spark's talk from Spark Summit 2014
http://spark-summit.org/2014/talk/model-search-at-scale
I am not sure if some of it is already open sourced through MLBase...
On Mon, Sep 29, 2014 at 7:45 PM, Lochana Menikarachchi
wrote:
> Hi,
>
> Is there anyone who works on hyper
I take a look at HashOuterJoin and it's building a Hashtable for both
sides.
This consumes quite a lot of memory when the partition is big. And it
doesn't reduce the iteration on streamed relation, right?
Thanks!
-
To unsubscrib
Hi Haopu,
My understanding is that the hashtable on both left and right side is used
for including null values in result in an efficient manner. If hash table
is only built on one side, let's say left side and we perform a left outer
join, for each row in left side, a scan over the right side is n
Hi, Liquan, thanks for the response.
In your example, I think the hash table should be built on the "right" side, so
Spark can iterate through the left side and find matches in the right side from
the hash table efficiently. Please comment and suggest, thanks again!
__
16 matches
Mail list logo