Re: Enabling fully disaggregated shuffle on Spark

2019-11-21 Thread Peter Rudenko
out of box TCP. We're open to integrate UCX to other big data components (Apache Arrow / Flight, HDFS, etc), that could be reused in Spark to make the whole spark workloads more effective. Would be glad to see your use cases on optimizing spark shuffle. Regards, Peter Rudenko чт, 21 лис

Re: SPARk-25299: Updates As Of December 19, 2018

2019-01-03 Thread Peter Rudenko
e can consider adding it to this new API. Let me know if you need help in review / testing / benchmark. I'll look more on documents and PR, Thanks, Peter Rudenko Software engineer at Mellanox Technologies. ср, 19 груд. 2018 о 20:54 John Zhuge пише: > Matt, appreciate the update! > &

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html) to use unix socket for local communication or just directly read a part from other's jvm shuffle file. But yes, it's not available in spark out of box. Thanks, Peter Rudenko пт, 19 жовт. 2018 о 16:54 Peter Liu пише: > Hi Peter, > > thank you f

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
ue to either non-present pages or mapping changes. So if you have an RDMA capable NIC (or you can try on Azure cloud https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/ ), have a try. For network intensive apps you should get better performance. Thanks,

Re: [SQL] codegen on wide dataset throws StackOverflow

2015-06-26 Thread Peter Rudenko
I'm using spark-1.4.0. Sure will try to make steps to reproduce and file a JIRA ticket. Thanks, Peter Rudenko On 2015-06-26 11:14, Josh Rosen wrote: Which Spark version are you using? Can you file a JIRA for this issue? On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko mailto:petro

[SQL] codegen on wide dataset throws StackOverflow

2015-06-25 Thread Peter Rudenko
. Thanks, Peter Rudenko

[Tungsten] NPE in UnsafeShuffleWriter.java

2015-06-19 Thread Peter Rudenko
) Any suggestions? Thanks, Peter Rudenko

[ml] Why all model classes are final?

2015-06-08 Thread Peter Rudenko
uld easily customize and combine for my need. Thanks, Peter Rudenko - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
e, Thanks, Peter Rudenko On 2015-06-01 21:10, Yin Huai wrote: Hi Peter, Based on your error message, seems you were not using the RC3. For the error thrown at HiveContext's line 206, we have changed the message to this one <https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
he.spark.sql.DataFrame.(DataFrame.scala:134) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHo

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-29 Thread Peter Rudenko
quet(DataFrameReader.scala:264) Thanks, Peter Rudenko On 2015-05-29 07:08, Yin Huai wrote: Justin, If you are creating multiple HiveContexts in tests, you need to assign a temporary metastore location for every HiveContext (like what we do at here <https://github.com/apache/spark/blob/master/s

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-28 Thread Peter Rudenko
or details. | Also is there build for hadoop2.6? Don’t see it here: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/ <http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/> Thanks, Peter Rudenko On 2015-05-22 22:56, Justin Uang wrote: I'm

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
that jar to classpath on hadoop-2.6. Thanks, Peter Rudenko On 2015-05-07 19:41, Nicholas Chammas wrote: I can try that, but the issue is I understand this is supposed to work out of the box (like it does with all the other Spark/Hadoop pre-built packages). On Thu, May 7, 2015 at 12:35 PM

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
Try to download this jar: http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar And add: export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar And try to relaunch. Thanks, Peter Rudenko On 2015-05-07 19:30, Nicholas Chammas wrote: Hmm, I just

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
Hi Nick, had the same issue. By default it should work with s3a protocol: sc.textFile('s3a://bucket/file_*').count() If you want to use s3n protocol you need to add hadoop-aws.jar to spark's classpath. Wich hadoop vendor (Hortonworks, Cloudera, MapR) do you use? Thanks, P

Re: [sql] Dataframe how to check null values

2015-04-20 Thread Peter Rudenko
Sounds very good. Is there a jira for this? Would be cool to have in 1.4, because currently cannot use dataframe.describe function with NaN values, need to filter manually all the columns. Thanks, Peter Rudenko On 2015-04-02 21:18, Reynold Xin wrote: Incidentally, we were discussing this

[sql] Dataframe how to check null values

2015-04-02 Thread Peter Rudenko
ad of null i can compare in UDF, but aggregation doesn’t work properly. Maybe it’s related to : https://issues.apache.org/jira/browse/SPARK-6573 Thanks, Peter Rudenko ​

[sql] How to uniquely identify Dataframe?

2015-03-30 Thread Peter Rudenko
was able to call SchemaRDD.id. Thanks, Peter Rudenko ​

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Peter Rudenko
It's fixed today: https://github.com/apache/spark/pull/4593 Thanks, Peter Rudenko On 2015-02-17 18:25, Evan R. Sparks wrote: Josh - thanks for the detailed write up - this seems a little funny to me. I agree that with the current code path there is extra work being done than needs to be

[ml] Lost persistence for fold in crossvalidation.

2015-02-11 Thread Peter Rudenko
rs. But i have 9 times data being read and cached. Thanks, Peter Rudenko ​