Re: Complexity with the data

2022-05-26 Thread Sid
Hello Everyone, I have posted a question finally with the dataset and the column names. PFB link: https://stackoverflow.com/questions/72389385/how-to-load-complex-data-using-pyspark Thanks, Sid On Thu, May 26, 2022 at 2:40 AM Bjørn Jørgensen wrote: > Sid, dump one of yours files. > > https:/

Re: Complexity with the data

2022-05-26 Thread Bjørn Jørgensen
Yes, it looks like a bug that we also have in pandas API on spark. So I have opened a JIRA for this. tor. 26. mai 2022 kl. 11:09 skrev Sid : > Hello Everyone, > > I have posted a question finally with the dataset and the column names. > > PFB l

Re: Complexity with the data

2022-05-26 Thread Sid
Thanks for opening the issue, Bjorn. However, could you help me to address the problem for now with some kind of alternative? I am actually stuck in this since yesterday. Thanks, Sid On Thu, 26 May 2022, 18:48 Bjørn Jørgensen, wrote: > Yes, it looks like a bug that we also have in pandas API o

Re: Complexity with the data

2022-05-26 Thread Apostolos N. Papadopoulos
Since you cannot create the DF directly, you may try to first create an RDD of tuples from the file and then convert the RDD to a DF by using the toDF() transformation. Perhaps you may bypass the issue with this. Another thing that I have seen in the example is that you are using "" as an esc

Re: Complexity with the data

2022-05-26 Thread Sid
I was passing the wrong escape characters due to which I was facing the issue. I have updated the user's answer on my post. Now I am able to load the dataset. Thank you everyone for your time and help! Much appreciated. I have more datasets like this. I hope that would be resolved using this app

Re: Complexity with the data

2022-05-26 Thread Bjørn Jørgensen
ok, but how do you read it now? https://github.com/apache/spark/blob/8f610d1b4ce532705c528f3c085b0289b2b17a94/python/pyspark/pandas/namespace.py#L216 probably have to be updated with the default options. This is so that pandas API on spark will be like pandas. tor. 26. mai 2022 kl. 17:38 skrev Si

Fwd: java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir --> Spark to Hive

2022-05-26 Thread Prasanth M Sasidharan
Hi Team, I am trying to persist data into a hive table through pyspark. Following is the line of code where its throwing error sparkSession = SparkSession.builder.appName('example-pyspark-read-and-write-from-hive').master("local").enableHiveSupport().config('spark.sql.catalogImplementation','hive'

Re: Complexity with the data

2022-05-26 Thread Sid
I am not reading it through pandas. I am using Spark because when I tried to use pandas which comes under import pyspark.pandas, it gives me an error. On Thu, May 26, 2022 at 9:52 PM Bjørn Jørgensen wrote: > ok, but how do you read it now? > > > https://github.com/apache/spark/blob/8f610d1b4ce53

Re: Complexity with the data

2022-05-26 Thread Bjørn Jørgensen
Yes, but how do you read it with spark. tor. 26. mai 2022, 18:30 skrev Sid : > I am not reading it through pandas. I am using Spark because when I tried > to use pandas which comes under import pyspark.pandas, it gives me an > error. > > On Thu, May 26, 2022 at 9:52 PM Bjørn Jørgensen > wrote: >

java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir --> Spark to Hive

2022-05-26 Thread Prasanth M Sasidharan
Hi Team, I am trying to persist data into a hive table through pyspark. Following is the line of code where its throwing error sparkSession = SparkSession.builder.appName('example-pyspark-read-and-write-from-hive').master("local").enableHiveSupport().config('spark.sql.catalogImplementation','hive'

Re: Complexity with the data

2022-05-26 Thread Gourav Sengupta
Hi, can you please give us a simple map of what the input is and what the output should be like? From your description it looks a bit difficult to figure out what exactly or how exactly you want the records actually parsed. Regards, Gourav Sengupta On Wed, May 25, 2022 at 9:08 PM Sid wrote: >

Re: Complexity with the data

2022-05-26 Thread Sid
Hi Gourav, Please find the below link for a detailed understanding. https://stackoverflow.com/questions/72389385/how-to-load-complex-data-using-pyspark/72391090#72391090 @Bjørn Jørgensen : I was able to read such kind of data using the below code: spark.read.option("header",True).option("mult

Issues getting Apache Spark

2022-05-26 Thread Martin, Michael
Hello, I'm writing to request assistance in getting Apache Spark on my laptop. I've followed instructions telling me to get Java, Python, Hadoop, Winutils, and Spark itself. I've followed instructions illustrating how to set my environment variables. For some reason, I still cannot get Spark to

Re: Issues getting Apache Spark

2022-05-26 Thread Apostolos N. Papadopoulos
How can we help if we do not know what is the problem? What is the error you are getting, at which step? Please give us more info to be able to help you. Spark installation on Linux/Windows is easy if you follow exactly the guidelines. Regards, Apostolos On 26/5/22 22:19, Martin, Michael