onable
> change, as the previous behavior doesn't make sense which always returns
> the first row. For safety, we can add a legacy config for fallback and
> mention it in the migration guide.
>
> On Wed, May 14, 2025 at 9:21 AM David Kunzmann
> wrote:
>
>> Hi James,
>
Willis wrote:
> This seems like the correct behavior to me. Every value of the null set of
> columns will match between any pair of Rows.
>
>
>
> On Thu, May 8, 2025 at 11:37 AM David Kunzmann
> wrote:
>
>> Hello everyone,
>>
>> Following the creation of t
cates and remove them.
>
This behavior is the same on the Scala side where
df.dropDuplicates(Seq.empty) returns the first row.
Would it make sense to change the behavior of df.dropDuplicates(Seq.empty)
to be the same as df.dropDuplicates() ?
Cheers,
David
CASE, WHILE, REPEAT, LOOP, FOR, LEAVE, ITERATE, etc.
SQL Scripting still doesn't work with Spark Connect.
Thanks,
David
On Wed, Jan 22, 2025 at 12:25 PM Stefan Kandic
wrote:
> Hi,
>
> I am working on adding collation support (
> https://issues.apache.org/jira/projects/SPARK/is
Hi,
It seems this library is several years old. Have you considered using the
Google provided connector? You can find it in
https://github.com/GoogleCloudDataproc/spark-bigquery-connector
Regards,
David Rabinowitz
On Sun, May 5, 2024 at 6:07 PM Jeff Zhang wrote:
> Are you s
d in open JDK until 2026, I'm not sure if we're
>>>> going to see enough folks moving to JRE17 by the Spark 4 release unless we
>>>> have a strong benefit from dropping 11 support I'd be inclined to keep it.
>>>>
>>>> On Tue, Jun 6, 2023
Hello Spark developers,
I'm from the Apache Arrow project. We've discussed Java version support [1],
and crucially, whether to continue supporting Java 8 or not. As Spark is a big
user of Arrow in Java, I was curious what Spark's policy here was.
If Spark intends to stay on Java 8, for instance
Hello, my name is David McWhorter and I created a new pull request to address
the SPARK-22256 ticket at https://github.com/apache/spark/pull/30739. This
change adds a memory overhead setting for the spark driver running on mesos.
This is a reopening of a prior pull request that was never merged
Unsubscribe
filers/memory_profiler
[3]: https://github.com/pandas-dev/pandas/issues/35530
[*] See my comment in https://issues.apache.org/jira/browse/ARROW-9878.
Thanks,
David
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
dependence on Hadoop 2.9 is not
required). Again, these may be non-issues, but I wanted to kindle discussion
around whether this can make the cut for 3.0, since I imagine it’s a major
upgrade many users will focus on migrating to once released.
Kind regards,
David Christle
fault for
spark.kubernetes.executor.cores would be. Seeing that I wanted more than 1 and
Yinan wants less, leaving it at 1 night be best.
Thanks,
David
From: Kimoon Kim
Date: Friday, March 30, 2018 at 4:28 PM
To: Yinan Li
Cc: David Vogelbacher , "dev@spark.apache.org"
Subject: Re: [K
,
David
smime.p7s
Description: S/MIME cryptographic signature
I am not familiar of any problem with that.
Anyway, If you run spark applicaction you would have multiple jobs, which makes
sense that it is not a problem.
Thanks David.
From: Naveen [mailto:hadoopst...@gmail.com]
Sent: Wednesday, December 21, 2016 9:18 AM
To: dev@spark.apache.org; u
> As far as group / artifact name compatibility, at least in the case of
> Kafka we need different artifact names anyway, and people are going to
> have to make changes to their build files for spark 2.0 anyway. As
> far as keeping the actual classes in org.apache.spark to not break
> code despi
the vision is to get rid of all cluster
> management when using Spark.
You might find one of the hosted Spark platform solutions such as
Databricks or Amazon EMR that handle cluster management for you a good
place to start. At least in my experience, they got me
>
ROSE Spark Package: https://github.com/onetapbeyond/opencpu-spark-executor
<https://github.com/onetapbeyond/opencpu-spark-executor>
Questions, suggestions, feedback welcome.
David
--
"*All that is gold does not glitter,** Not all those who wander are lost."*
PIs in Java, JavaScript
and .NET that can easily support your use case. The outputs of your DeployR
integration could then become inputs to your data processing system.
David
"All that is gold does not glitter, Not all those who wander are lost."
Original Message
Subject: R
weight as ROSE and it not
designed to work in a clustered environment. ROSE on the other hand is designed
for scale.
David
"All that is gold does not glitter, Not all those who wander are lost."
Original Message
Subject: Re: ROSE: Spark + R on the JVM.
Local Time:
ps://github.com/thomasnat1/cdcNewsRanker/blob/71b0ff3989d5191dc6a78c40c4a7a9967cbb0e49/venv/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py#L1049
)
I'm happy to help more if you decide to go this route, here, or on the
scala-breeze google group, or on github.
-- David
On Tue, Jan 12, 2016 at 10:28 AM, L
Hi Corey,
> Would you mind providing a link to the github?
Sure, here is the github link you're looking for:
https://github.com/onetapbeyond/opencpu-spark-executor
David
"All that is gold does not glitter, Not all those who wander are lost."
Original Message --
ou to [take a
look](https://github.com/onetapbeyond/opencpu-spark-executor). Any feedback,
questions etc very welcome.
David
"All that is gold does not glitter, Not all those who wander are lost."
on 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>
--
David Chin, Ph.D.
david.c...@drexel.eduSr. Systems Administrator, URCF, Drexel U.
http://www.drexel.edu/research/urcf/
https://linuxfollies.blogspot.com/
+1.215.221.4747 (mobile)
https://github.com/prehensilecode
ut the following is
equally slow:
df.join(laggard, (df("series") === laggard("p_series")) && (df("eday") -
laggard("p_eday")).between(1,7)).count
Any advice about the general principle at work here would be welcome.
T
sure.
On Wed, Mar 18, 2015 at 12:19 AM, Debasish Das
wrote:
> Hi David,
>
> We are stress testing breeze.optimize.proximal and nnls...if you are
> cutting a release now, we will need another release soon once we get the
> runtime optimizations in place and merged to breeze.
&g
ping?
On Sun, Mar 15, 2015 at 9:38 PM, David Hall wrote:
> snapshot is pushed. If you verify I'll publish the new artifacts.
>
> On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa > wrote:
>
>> David Hall who is a breeze creator told me that it's a bug. So, I made
snapshot is pushed. If you verify I'll publish the new artifacts.
On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa
wrote:
> David Hall who is a breeze creator told me that it's a bug. So, I made a
> jira
> ticket about this issue. We need to upgrade breeze from 0.11.1 to 0.11.2
n the goals for
Spark with GSoC (it is my understanding that Manoj Kumar is the mentor),
though I may be incorrect. I have been reading the Spark codebase on GitHub
and think I may be able to help develop Spark's Python API.
To get involved, what next steps should I take?
Thanks!
David J. Mang
I am new to Spark and GraphX, however, I use Tinkerpop backed graphs and
think the idea of using Tinkerpop as the API for GraphX is a great idea and
hope you are still headed in that direction. I noticed that Tinkerpop 3 is
moving into the Apache family:
http://wiki.apache.org/incubator/TinkerPopP
Thank you, Sean, using spark-network-yarn seems to do the trick.
On 12/19/2014 12:13 PM, Sean Owen wrote:
I believe spark-yarn does not exist from 1.2 onwards. Have a look at
spark-network-yarn for where some of that went, I believe.
On Fri, Dec 19, 2014 at 5:09 PM, David McWhorter wrote:
Hi
on
Any help or insights into how to use spark-yarn_2.10 1.2.0 in a maven
build would be appreciated.
David
--
David McWhorter
Software Engineer
Commonwealth Computer Research, Inc.
1422 Sachem Place, Unit #1
Charlottesville, VA 22901
mcwhor...@ccri.com | 4
n existing issue -- SPARK-3314
> > > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about
> scripting
> > > the
> > > > creation of Spark AMIs.
> > > >
> > > > With Packer, it looks like we may be able to script the creat
yeah, breeze.storage.Zero was introduced in either 0.8 or 0.9.
On Fri, Oct 3, 2014 at 9:45 AM, Xiangrui Meng wrote:
> Did you add a different version of breeze to the classpath? In Spark
> 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze
> version you used is different from the
I think this is exactly what packer is for. See e.g.
http://www.packer.io/intro/getting-started/build-image.html
On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has a
bad package for httpd, whcih causes ganglia not to start. For some reason I
can't get access to the raw AMI to
I've run into this with large shuffles - I assumed that there was
contention between the shuffle output files and the JVM for memory.
Whenever we start getting these fetch failures, it corresponds with high
load on the machines the blocks are being fetched from, and in some cases
complete unrespons
Hi all,
I watched am impressed spark demo video by Reynold Xin and Aaron Davidson
in youtube ( https://www.youtube.com/watch?v=FjhRkfAuU7I ). Can someone
let me know where can I find the source codes for the demo? I can¹t see
the source codes from video clearly.
Thanks in advance
CONFIDENTIALITY
mutating operations are not thread safe. Operations that don't mutate
should be thread safe. I can't speak to what Evan said, but I would guess
that the way they're using += should be safe.
On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling wrote:
> David,
>
> Can you confi
In general, in Breeze we allocate separate work arrays for each call to
lapack, so it should be fine. In general concurrent modification isn't
thread safe of course, but things that "ought" to be thread safe really
should be.
On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling wrote:
> No, it's not in
I have no ideas on benchmarks, but breeze has a CG solver:
https://github.com/scalanlp/breeze/tree/master/math/src/main/scala/breeze/optimize/linear/ConjugateGradient.scala
https://github.com/scalanlp/breeze/blob/e2adad3b885736baf890b306806a56abc77a3ed3/math/src/test/scala/breeze/optimize/linear/C
On Mon, May 5, 2014 at 3:40 PM, DB Tsai wrote:
> David,
>
> Could we use Int, Long, Float as the data feature spaces, and Double for
> optimizer?
>
Yes. Breeze doesn't allow operations on mixed types, so you'd need to
convert the double vectors to Floats if you wanted,
memory available...
>
>
> On Mon, May 5, 2014 at 3:06 PM, David Hall wrote:
>
> > Lbfgs and other optimizers would not work immediately, as they require
> > vector spaces over double. Otherwise it should work.
> > On May 5, 2014 3:03 PM, "DB Tsai" wrote:
&g
Lbfgs and other optimizers would not work immediately, as they require
vector spaces over double. Otherwise it should work.
On May 5, 2014 3:03 PM, "DB Tsai" wrote:
> Breeze could take any type (Int, Long, Double, and Float) in the matrix
> template.
>
>
> Sincerely,
>
> DB Tsai
> ---
in this online
bfgs:
http://jmlr.org/proceedings/papers/v2/schraudolph07a/schraudolph07a.pdf
-- David
On Tue, Apr 29, 2014 at 3:30 PM, DB Tsai wrote:
> Have a quick hack to understand the behavior of SLBFGS
> (Stochastic-LBFGS) by overwriting the breeze iterations method to get the
>
That's right.
FWIW, caching should be automatic now, but it might be the version of
Breeze you're using doesn't do that yet.
Also, In breeze.util._ there's an implicit that adds a tee method to
iterator, and also a last method. Both are useful for things like this.
-- D
asing objective value.
If you're regularizing, are you including the regularizer in the objective
value computation?
GD is almost never worth your time.
-- David
On Fri, Apr 25, 2014 at 2:57 PM, DB Tsai wrote:
> Another interesting benchmark.
>
> *News20 dataset - 0.14M row, 1,355
ture is sparse.
> >
> > Sincerely,
> >
> > DB Tsai
> > ---
> > My Blog: https://www.dbtsai.com
> > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
> >
> > On Wed, Apr 23, 2014 at
Was the weight vector sparse? The gradients? Or just the feature vectors?
On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai wrote:
> The figure showing the Log-Likelihood vs Time can be found here.
>
>
> https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result
ance you remember what the problems were? I'm sure it could be
better, but it's good to know where improvements need to happen.
-- David
>
> > On Apr 23, 2014, at 9:21 PM, DB Tsai wrote:
> >
> > Hi all,
> >
> > I'm benchmarking Logistic Regression
Another usage that's nice is:
logDebug {
val timeS = timeMillis/1000.0
s"Time: $timeS"
}
which can be useful for more complicated expressions.
On Thu, Apr 10, 2014 at 5:55 PM, Michael Armbrust wrote:
> BTW...
>
> You can do calculations in string interpolation:
> s"Time: ${timeMillis / 1
tic constraints...
>
I wonder how our general projected gradient solver would do? Clearly having
dedicated QP support is better, but in terms of just getting it working, it
might be enough.
-- David
>
>
>
> On Sun, Mar 30, 2014 at 4:40 PM, David Hall wrote:
>
> > On Sun,
On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das wrote:
> Hi David,
>
> I have started to experiment with BFGS solvers for Spark GLM over large
> scale data...
>
> I am also looking to add a good QP solver in breeze that can be used in
> Spark ALS for constraint solves...More
//issues.scala-lang.org/browse/SI-2509
-- David
Is there a time frame for adding a Java API?
-- david
> On 22 Mar 2014, at 05:11, "Reynold Xin" wrote:
>
> There is no Java API yet.
>
>
>> On Fri, Mar 21, 2014 at 3:18 AM, David Soroko wrote:
>>
>> Hi
>>
>> Where ca
, "student")), (7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof")), (2L, ("istoica", "prof"
thanks
--david
Is there any documentation available that explains the code architecture
that can help a new Spark framework developer?
On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai wrote:
> Hi David,
>
> I can converge to the same result with your breeze LBFGS and Fortran
> implementations now. Probably, I made some mistakes when I tried
> breeze before. I apologize that I claimed it's not stable.
>
I did not. They would be nice to have.
On Wed, Mar 5, 2014 at 5:21 PM, Debasish Das wrote:
> David,
>
> There used to be standard BFGS testcases in Professor Nocedal's
> package...did you stress test the solver with them ?
>
> If not I will shoot him an email for
On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai wrote:
> Hi David,
>
> On Tue, Mar 4, 2014 at 8:13 PM, dlwh wrote:
> > I'm happy to help fix any problems. I've verified at points that the
> > implementation gives the exact same sequence of iterates for a few
> differen
On Wed, Mar 5, 2014 at 8:50 AM, Debasish Das wrote:
> Hi David,
>
> Few questions on breeze solvers:
>
> 1. I feel the right place of adding useful things from RISO LBFGS (based on
> Professor Nocedal's fortran code) will be breeze. It will involve stress
> testing br
59 matches
Mail list logo