Re: [External] Re: [GraphFrames Spark Package]: Why is there not a distribution for Spark 3.3?

2024-03-17 Thread Ofir Manor
Just to add - the latest version is 0.8.3, it seems to support 3.3: "Support Spark 3.3 / Scala 2.12 , Spark 3.4 / Scala 2.12 and Scala 2.13, Spark 3.5 / Scala 2.12 and Scala 2.13" Releases ยท graphframes/graphframes (github.com) Ofir

Re: [External] Re: Redundant(?) shuffle after join

2024-08-19 Thread Ofir Manor
Shay - if I understand your question, you want to know if Spark has an optimization to eliminate shuffle from window functions in those conditions (when the window function partition key is equal to the bucket key, after a join), and if so, why it does not apply... Have you tried simpler varia

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-15 Thread Ofir Manor
ample, the new event-time window processing SPARK-8360). The gap I see is mostly limited streaming sources / sinks migrated to the new (richer) API and semantics. Anyway, I'm pretty sure once 2.0 gets to RC, the documentation and examples will align with the current offering... Ofir Manor Co-Fou

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-15 Thread Ofir Manor
ing process - I don't know if that will land in 2.0 or only later. Hope that helps, Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Sun, May 15, 2016 at 11:58 PM, Benjamin Kim wrote: > Hi Ofir, > > I just recently saw the webin

Re: Structured Streaming in Spark 2.0 and DStreams

2016-05-16 Thread Ofir Manor
://issues.apache.org/jira/browse/SPARK-13809 Eventually the pull request links into the design doc, that discusses the limits of updateStateByKey and mapWithState and how that will be handled... At a quick glance at the code, it seems to be used already in streaming aggregations. Just my two cents, Ofir Manor

Does decimal(6,-2) exists on purpose?

2016-05-26 Thread Ofir Manor
nn.nn and will become just . Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

Re: Timeline for supporting basic operations like groupBy, joins etc on Streaming DataFrames

2016-06-07 Thread Ofir Manor
for my use case. Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Tue, Jun 7, 2016 at 12:36 PM, Tathagata Das wrote: > 1. Not all types of joins are supported. Here is the list. > - Right outer joins - stream-batch not allowed, ba

Re: Spark, Scala, and DNA sequencing

2016-07-23 Thread Ofir Manor
Hi James, BTW - if you are into analyzing DNA with Spark, you may also be interested in ADAM: https://github.com/bigdatagenomics/adam http://bdgenomics.org/ Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Fri, Jul 22, 2016 at 10:3

Re: ORC v/s Parquet for Spark 2.0

2016-07-26 Thread Ofir Manor
One additional point specific to Spark 2.0 - for the alpha Structured Streaming API (only), the file sink only supports Parquet format (I'm sure that limitation will be lifted in a future release before Structured Streaming is GA): "File sink - Stores the output to a directory. As of Spark 2.

Re: The Future Of DStream

2016-07-27 Thread Ofir Manor
someone will suggest to start a deprecation process that will eventually lead to its removal... As a user, I guess we will need to apply judgement about when to switch to Structured Streaming - each of us have a different risk/value tradeoff, based on our specific situation... Ofir Manor Co-Founder

Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread Ofir Manor
t;old" Spark Streaming Programming Guide, as I think many users will look for them. I had a "deep link" to that page so I haven't noticed that it is very hard to find until now. I'm referring to this page: http://spark.apache.org/docs/latest/structured-streaming-programmin

Re: The Future Of DStream

2016-07-27 Thread Ofir Manor
For the 2.0 release, look for "Unsupported Operations" here: http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Also, there are bigger gaps - like no Kafka support, no way to plug user-defined sources or sinks etc Ofir Manor Co-Founder & CTO | Equalum

Re: ORC v/s Parquet for Spark 2.0

2016-07-28 Thread Ofir Manor
close to the details can explain. Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Thu, Jul 28, 2016 at 6:49 PM, Mich Talebzadeh wrote: > Like anything else your mileage varies. > > ORC with Vectorised query execution > <

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-25 Thread Ofir Manor
). Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Thu, Aug 25, 2016 at 11:13 PM, Mich Talebzadeh wrote: > Hi Kant, > > I trust the following would be of use. > > Big Data depends on Hadoop Ecosystem from whichever angle one look

Re: Structured Streaming - Can I start using it?

2017-03-14 Thread Ofir Manor
changes to monitoring, troubleshooting etc), so I think you should know what you want to achieve here and ask / prototype if current release fits it. Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Mon, Mar 13, 2017 at 9:45 PM, Michael Armbrust wro

Re: Does spark 2.1.0 structured streaming support jdbc sink?

2017-04-10 Thread Ofir Manor
Also check SPARK-19478 <https://issues.apache.org/jira/browse/SPARK-19478> - JDBC sink (seems to be waiting for a review) Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Mon, Apr 10, 2017 at 10:10 AM, Hemanth Gudela wrote: > Many

tuning - Spark data serialization for cache() ?

2017-08-07 Thread Ofir Manor
with some other variations, like enabling Kyro by the tuning guide instructions, but didn't see any impact on the cached dataframe size (same tens of GBs in the UI). So any tips around that? Thanks. Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Ofir Manor
Thanks a lot for the quick pointer! So, is the advice I linked to in official Spark 2.2 documentation misleading? You are saying that Spark 2.2 does not use by Java serialization? And the tip to switch to Kyro is also outdated? Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-780