out of box TCP.
We're open to integrate UCX to other big data components (Apache Arrow
/ Flight, HDFS, etc), that could be reused in Spark to make the whole
spark workloads more effective.
Would be glad to see your use cases on optimizing spark shuffle.
Regards,
Peter Rudenko
чт, 21 лис
e can consider adding
it to this new API.
Let me know if you need help in review / testing / benchmark.
I'll look more on documents and PR,
Thanks,
Peter Rudenko
Software engineer at Mellanox Technologies.
ср, 19 груд. 2018 о 20:54 John Zhuge пише:
> Matt, appreciate the update!
>
&
-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html)
to use unix socket for local communication or just directly read a part
from other's jvm shuffle file. But yes, it's not available in spark out of
box.
Thanks,
Peter Rudenko
пт, 19 жовт. 2018 о 16:54 Peter Liu пише:
> Hi Peter,
>
> thank you f
ue to either non-present pages or mapping
changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
), have a try. For network intensive apps you should get better
performance.
Thanks,
I'm using spark-1.4.0. Sure will try to make steps to reproduce and file
a JIRA ticket.
Thanks,
Peter Rudenko
On 2015-06-26 11:14, Josh Rosen wrote:
Which Spark version are you using? Can you file a JIRA for this issue?
On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko
mailto:petro
.
Thanks,
Peter Rudenko
)
Any suggestions?
Thanks,
Peter Rudenko
uld easily
customize and combine for my need.
Thanks,
Peter Rudenko
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org
e,
Thanks,
Peter Rudenko
On 2015-06-01 21:10, Yin Huai wrote:
Hi Peter,
Based on your error message, seems you were not using the RC3. For the
error thrown at HiveContext's line 206, we have changed the message to
this one
<https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/
he.spark.sql.DataFrame.(DataFrame.scala:134) at
org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at
org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHo
quet(DataFrameReader.scala:264)
Thanks,
Peter Rudenko
On 2015-05-29 07:08, Yin Huai wrote:
Justin,
If you are creating multiple HiveContexts in tests, you need to assign
a temporary metastore location for every HiveContext (like what we do
at here
<https://github.com/apache/spark/blob/master/s
or details. |
Also is there build for hadoop2.6? Don’t see it here:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/>
Thanks,
Peter Rudenko
On 2015-05-22 22:56, Justin Uang wrote:
I'm
that jar to classpath on hadoop-2.6.
Thanks,
Peter Rudenko
On 2015-05-07 19:41, Nicholas Chammas wrote:
I can try that, but the issue is I understand this is supposed to work
out of the box (like it does with all the other Spark/Hadoop pre-built
packages).
On Thu, May 7, 2015 at 12:35 PM
Try to download this jar:
http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar
And add:
export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar
And try to relaunch.
Thanks,
Peter Rudenko
On 2015-05-07 19:30, Nicholas Chammas wrote:
Hmm, I just
Hi Nick, had the same issue.
By default it should work with s3a protocol:
sc.textFile('s3a://bucket/file_*').count()
If you want to use s3n protocol you need to add hadoop-aws.jar to
spark's classpath. Wich hadoop vendor (Hortonworks, Cloudera, MapR) do
you use?
Thanks,
P
Sounds very good. Is there a jira for this? Would be cool to have in
1.4, because currently cannot use dataframe.describe function with NaN
values, need to filter manually all the columns.
Thanks,
Peter Rudenko
On 2015-04-02 21:18, Reynold Xin wrote:
Incidentally, we were discussing this
ad of null i can compare in UDF, but aggregation
doesn’t work properly. Maybe it’s related to :
https://issues.apache.org/jira/browse/SPARK-6573
Thanks,
Peter Rudenko
was
able to call SchemaRDD.id.
Thanks,
Peter Rudenko
It's fixed today: https://github.com/apache/spark/pull/4593
Thanks,
Peter Rudenko
On 2015-02-17 18:25, Evan R. Sparks wrote:
Josh - thanks for the detailed write up - this seems a little funny to me.
I agree that with the current code path there is extra work being done than
needs to be
rs.
But i have 9 times data being read and cached.
Thanks,
Peter Rudenko
20 matches
Mail list logo