Re: convert java dataframe to pyspark dataframe

2021-03-31 Thread Aditya Singh
Thanks a lot, this was really helpful. On Wed, 31 Mar 2021 at 4:13 PM, Khalid Mammadov wrote: > I think what you want to achieve is what PySpark is actually doing in it's > API under the hood. > > So, specifically you need to look at PySpark's implementation of > DataFrame, SparkSession and Spar

Re: convert java dataframe to pyspark dataframe

2021-03-31 Thread Khalid Mammadov
I think what you want to achieve is what PySpark is actually doing in it's API under the hood. So, specifically you need to look at PySpark's implementation of DataFrame, SparkSession and SparkContext API. Under the hood that what is happening, it start a py4j gateway and delegates all Spark o

Re: convert java dataframe to pyspark dataframe

2021-03-31 Thread Aditya Singh
Thanks a lot Khalid for replying. I have one question though. The approach tou showed needs an understanding on python side before hand about the data type of columns of dataframe. Can we implement a generic approach where this info is not required and we just have the java dataframe as input on p

Re: convert java dataframe to pyspark dataframe

2021-03-30 Thread Khalid Mammadov
Hi Aditya, I think you original question was as how to convert a DataFrame from Spark session created on Java/Scala to a DataFrame on a Spark session created from Python(PySpark). So, as I have answered on your SO question: There is a missing call to *entry_point* before calling getDf() in

Re: convert java dataframe to pyspark dataframe

2021-03-30 Thread Aditya Singh
Hi Sean, Thanks a lot for replying and apologies for the late reply(I somehow missed this mail before) but I am under the impression that passing the py4j. java_gateway.JavaGateway object lets the pyspark access the spark context created on the java side. My use case is exactly what you mentioned

Re: convert java dataframe to pyspark dataframe

2021-03-26 Thread Sean Owen
The problem is that both of these are not sharing a SparkContext as far as I can see, so there is no way to share the object across them, let alone languages. You can of course write the data from Java, read it from Python. In some hosted Spark products, you can access the same session from two l

convert java dataframe to pyspark dataframe

2021-03-26 Thread Aditya Singh
Hi All, I am a newbie to spark and trying to pass a java dataframe to pyspark. Foloowing link has details about what I am trying to do:- https://stackoverflow.com/questions/66797382/creating-pysparks-spark-context-py4j-java-gateway-object Can someone please help me with this? Thanks,