Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-05 Thread Mich Talebzadeh
OK I found a workaround. Basically each stream state is not kept and I have two streams. One is a business topic and the other one created to shut down spark structured streaming gracefully. I was interested to print the value for the most recent batch Id for the business topic called "md" here u

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This might help https://docs.databricks.com/structured-streaming/foreach.html streamingDF.writeStream.foreachBatch(...) allows you to specify a function that is executed on the output data of every micro-batch of the streaming query. It takes two parameters: a DataFrame or Dataset that has the ou

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
I am aware of your point that global don't work in a distributed environment. With regard to your other point, these are two different topics with their own streams. The point of second stream is to set the status to false, so it can gracefully shutdown the main stream (the one called "md") here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
I don't quite get it - aren't you applying to the same stream, and batches? worst case why not apply these as one function? Otherwise, how do you mean to associate one call to another? globals don't help here. They aren't global beyond the driver, and, which one would be which batch? On Sat, Mar 4

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Thanks. they are different batchIds >From sendToControl, newtopic batchId is 76 >From sendToSink, md, batchId is 563 As a matter of interest, why does a global variable not work? view my Linkedin profile https://en.everybodywiki.

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
It's the same batch ID already, no? Or why not simply put the logic of both in one function? or write one function that calls both? On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh wrote: > > This is probably pretty straight forward but somehow is does not look > that way > > > > On Spark Structu

How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This is probably pretty straight forward but somehow is does not look that way On Spark Structured Streaming, "foreachBatch" performs custom write logic on each micro-batch through a call function. Example, foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as DataFrame or Data

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-26 Thread Mich Talebzadeh
Well I gave up on using anything except the standard one offered by PySpark itself. The problem is that anything that is homemade (UDF), is never going to be as performant as the functions offered by Spark itself. What I don't understand is why a numpy STDDEV provided should be more performant than

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
Why not just use STDDEV_SAMP? it's probably more accurate than the differences-of-squares calculation. You can write an aggregate UDF that calls numpy and register it for SQL, but, it is already a built-in. On Thu, Dec 24, 2020 at 8:12 AM Mich Talebzadeh wrote: > Thanks for the feedback. > > I h

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Thanks for the feedback. I have a question here. I want to use numpy STD as well but just using sql in pyspark. Like below sqltext = f""" SELECT rs.Customer_ID , rs.Number_of_orders , rs.Total_customer_amount , rs.Average_order , rs.Standard_deviation

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Sean Owen
I don't know which one is 'correct' (it's not standard SQL?) or whether it's the sample stdev for a good reason or just historical now. But you can always call STDDEV_SAMP (in any DB) if needed. It's equivalent to numpy.std with ddof=1, the Bessel-corrected standard deviation. On Thu, Dec 24, 2020

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-24 Thread Mich Talebzadeh
Well the truth is that we had this discussion in 2016 :(. what Hive calls Standard Deviation Function STDDEV is a pointer to STDDEV_POP. This is incorrect and has not been rectified yet! Spark-sql, Oracle and Sybase point STDDEV to STDDEV_SAMP and not STDDEV_POP. Run a test on *Hive* SELECT S

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Sean Owen
Why do you want to use this function instead of the built-in stddev function? On Wed, Dec 23, 2020 at 2:52 PM Mich Talebzadeh wrote: > Hi, > > > This is a shot in the dark so to speak. > > > I would like to use the standard deviation std offered by numpy in > PySpark. I am using SQL for now > >

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Mich Talebzadeh
OK Thanks for the tip. I found this link useful for Python from Databricks User-defined functions - Python — Databricks Documentation LinkedIn * https://w

Re: Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Peyman Mohajerian
https://stackoverflow.com/questions/43484269/how-to-register-udf-to-use-in-sql-and-dataframe On Wed, Dec 23, 2020 at 12:52 PM Mich Talebzadeh wrote: > Hi, > > > This is a shot in the dark so to speak. > > > I would like to use the standard deviation std offered by numpy in > PySpark. I am using

Using UDF based on Numpy functions in Spark SQL

2020-12-23 Thread Mich Talebzadeh
Hi, This is a shot in the dark so to speak. I would like to use the standard deviation std offered by numpy in PySpark. I am using SQL for now The code as below sqltext = f""" SELECT rs.Customer_ID , rs.Number_of_orders , rs.Total_customer_amount ,

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
You could have posted just the error, which is at the end of my response. Why are you trying to use WebHDFS? I'm not really sure how authentication works with that. But generally applications use HDFS (which uses a different URI scheme), and Spark should work fine with that. Error: Authenticatio

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Sure - I wanted to check with admin before sharing. I’ve attached it now, does this help? Many thanks again, G Container: container_e34_1479877553404_0174_01_03 on hdp-node12.xcat.cluster_45454_1481228528201 ==

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
Then you probably have a configuration error somewhere. Since you haven't actually posted the error you're seeing, it's kinda hard to help any further. On Thu, Dec 8, 2016 at 11:17 AM, Gerard Casey wrote: > Right. I’m confident that is setup correctly. > > I can run the SparkPi test script. The m

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Right. I’m confident that is setup correctly. I can run the SparkPi test script. The main difference between it and my application is that it doesn’t access HDFS. > On 8 Dec 2016, at 18:43, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey > wrote: >> To be specific, w

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey wrote: > To be specific, where exactly should spark.authenticate be set to true? spark.authenticate has nothing to do with kerberos. It's for authentication between different Spark processes belonging to the same app. -- Marcelo ---

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcin, That seems to be the case. It explains why there is no documentation on this part too! To be specific, where exactly should spark.authenticate be set to true? Many thanks, Gerry > On 8 Dec 2016, at 08:46, Marcin Pastecki wrote: > > My understanding is that the token generatio

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcin Pastecki
My understanding is that the token generation is handled by Spark itself as long as you were authenticated in Kerberos when submitting the job and spark.authenticate is set to true. --keytab and --principal options should be used for "long" running job, when you may need to do ticket renewal. Spar

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
I just read an interesting comment on cloudera: What does it mean by “when the job is submitted,and you have a kinit, you will have TOKEN to access HDFS, you would need to pass that on, or the KERBEROS ticket” ? Reference

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcelo. I’ve completely removed it. Ok - even if I read/write from HDFS? Trying to the SparkPi example now G > On 7 Dec 2016, at 22:10, Marcelo Vanzin wrote: > > Have you removed all the code dealing with Kerberos that you posted? > You should not be setting those principal / keytab c

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
Have you removed all the code dealing with Kerberos that you posted? You should not be setting those principal / keytab configs. Literally all you have to do is login with kinit then run spark-submit. Try with the SparkPi example for instance, instead of your own code. If that doesn't work, you h

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks. I’ve checked the TGT, principal and key tab. Where to next?! > On 7 Dec 2016, at 22:03, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey > wrote: >> Can anyone point me to a tutorial or a run through of how to use Spark with >> Kerberos? This is proving to be q

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey wrote: > Can anyone point me to a tutorial or a run through of how to use Spark with > Kerberos? This is proving to be quite confusing. Most search results on the > topic point to what needs inputted at the point of `sparks submit` and not > the change

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks Marcelo, Turns out I had missed setup steps in the actual file itself. Thanks to Richard for the help here. He pointed me to some java implementations. I’m using the import org.apache.hadoop.security API. I now have: /* graphx_sp.scala */ import scala.util.Try import scala.io.Source imp

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
That's not the error, that's just telling you the application failed. You have to look at the YARN logs for application_1479877553404_0041 to see why it failed. On Mon, Dec 5, 2016 at 10:44 AM, Gerard Casey wrote: > Thanks Marcelo, > > My understanding from a few pointers is that this may be due

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Thanks Marcelo, My understanding from a few pointers is that this may be due to insufficient read permissions to the key tab or a corrupt key tab. I have checked the read permissions and they are ok. I can see that it is initially configuring correctly: INFO security.UserGroupInformatio

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
There's generally an exception in these cases, and you haven't posted it, so it's hard to tell you what's wrong. The most probable cause, without the extra information the exception provides, is that you're using the wrong Hadoop configuration when submitting the job to YARN. On Mon, Dec 5, 2016 a

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Jorge Sánchez
Hi Gerard, have you tried running in yarn-client mode? If so, do you still get that same error? Regards. 2016-12-05 12:49 GMT+00:00 Gerard Casey : > Edit. From here > > I > read that you can pass a `key tab` op

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Edit. From here I read that you can pass a `key tab` option to spark-submit. I thus tried spark-submit --class "graphx_sp" --master yarn --keytab /path/to/keytab --deploy-mode cluster --executor-memory 13G --t

Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Hello all, I am using Spark with Kerberos authentication. I can run my code using `spark-shell` fine and I can also use `spark-submit` in local mode (e.g. —master local[16]). Both function as expected. local mode - spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G

Running window functions in spark dataframe

2016-01-13 Thread rakesh sharma
Hi all I am getting hivecontext error when trying to run to run window functions like over on ordering clause. Any help to go about. I am running spark locally Sent from Ouertlook Mobile -- Forwarded message -- From: "King sami" mailto:kgsam...@gmail.com>

Re: Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Pedro Rodriguez
Worth noting that Spark 1.5 is extending that list of Spark SQL functions quite a bit. Not sure where in the docs they would be yet, but the JIRA is here: https://issues.apache.org/jira/browse/SPARK-8159 On Thu, Aug 6, 2015 at 7:27 PM, Netwaver wrote: > Thanks for your kindly help > > > > > > >

Re:Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Netwaver
Thanks for your kindly help At 2015-08-06 19:28:10, "Todd Nist" wrote: They are covered here in the docs: http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wrote: Hi All, I am using Spark 1.4.1,

Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Todd Nist
They are covered here in the docs: http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wrote: > Hi All, > I am using Spark 1.4.1, and I want to know how can I find the > complete function list supported in Sp

Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Ted Yu
Have you looked at this? http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$ > On Aug 6, 2015, at 2:52 AM, Netwaver wrote: > > Hi All, > I am using Spark 1.4.1, and I want to know how can I find the > complete function list supported in Spark SQL,

How can I know currently supported functions in Spark SQL

2015-08-06 Thread Netwaver
Hi All, I am using Spark 1.4.1, and I want to know how can I find the complete function list supported in Spark SQL, currently I only know 'sum','count','min','max'. Thanks a lot.

Re: Functions in Spark SQL

2015-07-27 Thread vinod kumar
to paste your test code here ? And which version of Spark > are u using ? > > Best, > Sun. > > -- > fightf...@163.com > > > *From:* vinod kumar > *Date:* 2015-07-27 15:04 > *To:* User > *Subject:* Functions in Spark SQ

Re: Functions in Spark SQL

2015-07-27 Thread fightf...@163.com
: Functions in Spark SQL Hi, May I know how to use the functions mentioned in http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$ in spark sql? when I use like "Select last(column) from tablename" I am getting error like 15/07/27 03:

Functions in Spark SQL

2015-07-27 Thread vinod kumar
Hi, May I know how to use the functions mentioned in http://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$ in spark sql? when I use like "Select last(column) from tablename" I am getting error like 15/07/27 03:00:00 INFO exec.FunctionRegistry: Unable to lookup

RE: Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread Cheng, Hao
Yes, with should be with HiveContext, not SQLContext. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Tuesday, June 23, 2015 2:51 AM To: smazumder Cc: user Subject: Re: Support for Windowing and Analytics functions in Spark SQL 1.4 supports it On 23 Jun 2015 02:59, "Sourav Maz

Re: Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread ayan guha
1.4 supports it On 23 Jun 2015 02:59, "Sourav Mazumder" wrote: > Hi, > > Though the documentation does not explicitly mention support for Windowing > and Analytics function in Spark SQL, looks like it is not supported. > > I tried running a query like Select Lead(, 1) over (Partition > By order

Support for Windowing and Analytics functions in Spark SQL

2015-06-22 Thread Sourav Mazumder
Hi, Though the documentation does not explicitly mention support for Windowing and Analytics function in Spark SQL, looks like it is not supported. I tried running a query like Select Lead(, 1) over (Partition By order by ) from and I got error saying that this feature is unsupported. I tried

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Arush Kharbanda
You can look at the Spark SQL programming guide. http://spark.apache.org/docs/1.3.0/sql-programming-guide.html and the Spark API. http://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.package On Thu, Mar 26, 2015 at 5:21 PM, Masf wrote: > Ok, > > Thanks. Some web resource whe

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Masf
Ok, Thanks. Some web resource where I could check the functionality supported by Spark SQL? Thanks!!! Regards. Miguel Ángel. On Thu, Mar 26, 2015 at 12:31 PM, Cheng Lian wrote: > We're working together with AsiaInfo on this. Possibly will deliver an > initial version of window function suppo

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Cheng Lian
We're working together with AsiaInfo on this. Possibly will deliver an initial version of window function support in 1.4.0. But it's not a promise yet. Cheng On 3/26/15 7:27 PM, Arush Kharbanda wrote: Its not yet implemented. https://issues.apache.org/jira/browse/SPARK-1442 On Thu, Mar 26,

Re: Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Arush Kharbanda
Its not yet implemented. https://issues.apache.org/jira/browse/SPARK-1442 On Thu, Mar 26, 2015 at 4:39 PM, Masf wrote: > Hi. > > Are the Windowing and Analytics functions supported in Spark SQL (with > HiveContext or not)? For example in Hive is supported > https://cwiki.apache.org/confluence/d

Windowing and Analytics Functions in Spark SQL

2015-03-26 Thread Masf
Hi. Are the Windowing and Analytics functions supported in Spark SQL (with HiveContext or not)? For example in Hive is supported https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics Some tutorial or documentation where I can see all features supported by Spark SQ

Re: Mathematical functions in spark sql

2015-01-27 Thread Cheng Lian
of functions supported by spark sql. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mathematical-functions-in-spark-sql-tp21383.html Sent from the Apac

Re: Mathematical functions in spark sql

2015-01-27 Thread Ted Yu
eryone! >>> >>> I try execute "select 2/3" and I get "0.". Is there any >>> way >>> to cast double to int or something similar? >>> >>> Also it will be cool to get list of functions supported by spark sql. >&g

Re: Mathematical functions in spark sql

2015-01-26 Thread Alexey Romanchuk
be cool to get list of functions supported by spark sql. >> >> Thanks! >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Mathematical-functions-in-spark-sql-tp21383.html >> Sent from the Apa

Re: Mathematical functions in spark sql

2015-01-26 Thread Ted Yu
;. Is there any way > to cast double to int or something similar? > > Also it will be cool to get list of functions supported by spark sql. > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Mathematical-functio

Mathematical functions in spark sql

2015-01-26 Thread 1esha
user-list.1001560.n3.nabble.com/Mathematical-functions-in-spark-sql-tp21383.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional comm

Re: Functions in Spark

2014-11-17 Thread Gerard Maas
One 'rule of thumbs' is to use rdd.toDebugString and check the lineage for ShuffleRDD. As long as there's no need for restructuring the RDD, operations can be pipelined on each partition. "rdd.toDebugString" is your friend :-) -kr, Gerard. On Mon, Nov 17, 2014 at 7:37 AM, Mukesh Jha wrote: >

Re: Functions in Spark

2014-11-16 Thread Mukesh Jha
Thanks I did go through the video it was very informative, but I think I's looking for the Transformations section @ page https://spark.apache.org/docs/0.9.1/scala-programming-guide.html. On Mon, Nov 17, 2014 at 10:31 AM, Samarth Mailinglist < mailinglistsama...@gmail.com> wrote: > Check this vi

Re: Functions in Spark

2014-11-16 Thread Samarth Mailinglist
Check this video out: https://www.youtube.com/watch?v=dmL0N3qfSc8&list=UURzsq7k4-kT-h3TDUBQ82-w On Mon, Nov 17, 2014 at 9:43 AM, Deep Pradhan wrote: > Hi, > Is there any way to know which of my functions perform better in Spark? In > other words, say I have achieved same thing using two differen

Functions in Spark

2014-11-16 Thread Deep Pradhan
Hi, Is there any way to know which of my functions perform better in Spark? In other words, say I have achieved same thing using two different implementations. How do I judge as to which implementation is better than the other. Is processing time the only metric that we can use to claim the goodnes

Re: Support for Percentile and Variance Aggregation functions in Spark with HiveContext

2014-07-25 Thread Michael Armbrust
Hmm, in general we try to support all the UDAFs, but this one must be using a different base class that we don't have a wrapper for. JIRA here: https://issues.apache.org/jira/browse/SPARK-2693 On Fri, Jul 25, 2014 at 8:06 AM, wrote: > > Hi all, > > I am using Spark 1.0.0 with CDH 5.1.0. > > I

Support for Percentile and Variance Aggregation functions in Spark with HiveContext

2014-07-25 Thread vinay . kashyap
Hi all, I am using Spark 1.0.0 with CDH 5.1.0. I want to aggregate the data in a raw table using a simple query like below SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4), year,month,day FROM  raw_data_table  GROUP BY year, month, day MIN, MAX and AVG functions work fine for m

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Nicholas Chammas
r-explanation-for-the-functions-in-Spark-shell-tp7191p7193.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Carter
Thank you very much Gerard. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191p7193.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Gerard Maas
of > functions that I can use for this RDD will be displayed, but I dont know > how > to use these functions. > > Your help is greatly appreciated. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help

How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Carter
ll be displayed, but I dont know how to use these functions. Your help is greatly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191.html Sent from the Apache Spark User List ma

Re: Using Java functions in Spark

2014-06-07 Thread Oleg Proudnikov
Increasing number of partitions on data file solved the problem. On 6 June 2014 18:46, Oleg Proudnikov wrote: > Additional observation - the map and mapValues are pipelined and executed > - as expected - in pairs. This means that there is a simple sequence of > steps - first read from Cassandra

Re: Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Additional observation - the map and mapValues are pipelined and executed - as expected - in pairs. This means that there is a simple sequence of steps - first read from Cassandra and then processing for each value of K. This is the exact behaviour of a normal Java loop with these two steps inside.

Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov
Hi All, I am passing Java static methods into RDD transformations map and mapValues. The first map is from a simple string K into a (K,V) where V is a Java ArrayList of large text strings, 50K each, read from Cassandra. MapValues does processing of these text blocks into very small ArrayLists. Th