from:"Raymond Xie"

Anomaly when dealing with Unix timestamp

2018-06-19 Thread Raymond Xie

Hello, I have a dataframe, apply from_unixtime seems to expose an anomaly: *scala> val bhDF4 = bhDF.withColumn("ts1", $"ts" + 28800).withColumn("ts2", from_unixtime($"ts" + 28800,"MMddhhmmss"))* *bhDF4: org.apache.spark.sql.DataFrame = [user_id: int, item_id: int ... 5 more fields]* *scala>

Re: Best way to process this dataset

2018-06-19 Thread Raymond Xie

ser is based on univocity and you might use the > "spark.read.csc" syntax instead of using the rdd api; > > From my experience, this will better than any other csv parser > > 2018-06-19 16:43 GMT+02:00 Raymond Xie : > >> Thank you Matteo, Askash and Georg: >> >&

Re: Best way to process this dataset

2018-06-19 Thread Raymond Xie

> wrote: >> >>> use pandas or dask >>> >>> If you do want to use spark store the dataset as parquet / orc. And then >>> continue to perform analytical queries on that dataset. >>> >>> Raymond Xie schrieb am Di., 19. Juni 2018 um

Best way to process this dataset

2018-06-18 Thread Raymond Xie

I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment is 20GB ssd harddisk and 2GB RAM. The dataset comes with User ID: 987,994 Item ID: 4,162,024 Category ID: 9,439 Behavior type ('pv', 'buy', 'cart', 'fav') Unix Timestamp: span between November 25 to December 03, 2017 I would

Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Raymond Xie

e > > On Jun 17, 2018, at 2:32 PM, Raymond Xie wrote: > > Hello, > > I am wondering how can I run spark job in my environment which is a single > Ubuntu host with no hadoop installed? if I run my job like below, I will > end up with infinite loop at the end. Thank you very

how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Raymond Xie

Hello, I am wondering how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed? if I run my job like below, I will end up with infinite loop at the end. Thank you very much. rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder --conf spark.

Re: Error: Could not find or load main class org.apache.spark.launcher.Main

2018-06-17 Thread Raymond Xie

t; > > Best Regards, > > Vamshi T > > > -- > *From:* Raymond Xie > *Sent:* Sunday, June 17, 2018 6:27 AM > *To:* user; Hui Xie > *Subject:* Error: Could not find or load main class > org.apache.spark.launcher.Main > > Hello, > > I

Error: Could not find or load main class org.apache.spark.launcher.Main

2018-06-17 Thread Raymond Xie

Hello, It would be really appreciated if anyone can help sort it out the following path issue for me? I highly doubt this is related to missing path setting but don't know how can I fix it. rxie@ubuntu:~/Downloads/spark$ echo $PATH /usr/bin/java:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin

spark-shell doesn't start

2018-06-17 Thread Raymond Xie

Hello, I am doing the practice in Ubuntu now, here is the error I am encountering: rxie@ubuntu:~/Downloads/spark/bin$ spark-shell Error: Could not find or load main class org.apache.spark.launcher.Main What am I missing? Thank you very much. Java is installed. *--

spark-submit Error: Cannot load main class from JAR file

2018-06-17 Thread Raymond Xie

Hello, I am doing the practice in windows now. I have the jar file generated under: C:\RXIE\Learning\Scala\spark2practice\target\scala-2. 11\spark2practice_2.11-0.1.jar The package name is Retail_db and the object is GetRevenuePerOrder. The spark-submit command is: spark-submit retail_db.GetReve

Re: Not able to sort out environment settings to start spark from windows

2018-06-16 Thread Raymond Xie

ome path . > May be spacial char or space on ur path. > > Regards, > Vaquar khan > > On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote: > >> I am trying to run spark-shell in Windows but receive error of: >> >> \Java\jre1.8.0_151\bin\java was unexpected at

Not able to sort out environment settings to start spark from windows

2018-06-16 Thread Raymond Xie

I am trying to run spark-shell in Windows but receive error of: \Java\jre1.8.0_151\bin\java was unexpected at this time. Environment: System variables: SPARK_HOME: c:\spark Path: C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\ProgramData\Anaconda2;C:\ProgramData\Anaconda2\Librar

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie

wrote: > Try to use --packages to include the jars. From error it seems it's > looking for main class in jars but u r running a python script... > > On 25 Feb 2017 10:36 pm, "Raymond Xie" wrote: > > That's right Anahita, however, the class name is

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie

PM, Anahita Talebi wrote: > You're welcome. > You need to specify the class. I meant like that: > > spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0. > 0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class" > > > >

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie

ed it by removing --jars. > > Cheers, > Anahita > > On Saturday, February 25, 2017, Raymond Xie wrote: > >> I am doing a spark streaming on a hortonworks sandbox and am stuck here >> now, can anyone tell me what's wrong with the following code and the >> except

No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie

I am doing a spark streaming on a hortonworks sandbox and am stuck here now, can anyone tell me what's wrong with the following code and the exception it causes and how do I fix it? Thank you very much in advance. spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-124

Re: How to connect Tableau to databricks spark?

2017-01-09 Thread Raymond Xie

nday, January 9, 2017 at 2:59 PM > *To: *Raymond Xie , user > *Subject: *Re: How to connect Tableau to databricks spark? > > > > Hi Raymond, > > > > Are you using a Spark 2.0 or 1.6 cluster? With Spark 2.0 it’s just a > matter of entering the hostname of your

How to connect Tableau to databricks spark?

2017-01-08 Thread Raymond Xie

I want to do some data analytics work by leveraging Databricks spark platform and connect my Tableau desktop to it for data visualization. Does anyone ever make it? I've trying to follow the instruction below but not successful? https://docs.cloud.databricks.com/docs/latest/databricks_guide/01%20

subsription

2017-01-08 Thread Raymond Xie

** *Sincerely yours,* *Raymond*

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie

e", StringType) > val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema) > > But somehow i seem to remember that there was a way , in Spark 2.0, so > that Spark will infer the schema for you.. > > hth > marco > > > > > > On Sun, Jan 1, 2017 a

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie

urs,* *Raymond* On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales wrote: > Looks like it's trying to treat that path as a folder, try omitting > the file name and just use the folder path. > > On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie wrote: > > Happy new year!!! &g

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie

n(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) >>> ** *Sincerely yours,* *Raymond* On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales wrote: > Looks like it's trying to treat that path as a folder, try omitting > the file n

From Hive to Spark, what is the default database/table

2016-12-31 Thread Raymond Xie

Hello, It is indicated in https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#dataframes when Running SQL Queries Programmatically you can do: from pyspark.sql import SQLContextsqlContext = SQLContext(sc)df = sqlContext.sql("SELECT * FROM table") However, it did not indicate what sh

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Raymond Xie

ote: > Have you tried the spark-csv package? > > https://spark-packages.org/package/databricks/spark-csv > > > ------ > *From:* Raymond Xie > *Sent:* Friday, December 30, 2016 6:46:11 PM > *To:* user@spark.apache.org > *Subject:* How to load a

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie

Thanks Felix, I will try it tomorrow ~~~sent from my cell phone, sorry if there is any typo 2016年12月30日下午10:08，"Felix Cheung" 写道： > Have you tried the spark-csv package? > > https://spark-packages.org/package/databricks/spark-csv > > > --------

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie

- Original message ---- From: Raymond Xie Date: 31/12/2016 10:46 (GMT+08:00) To: user@spark.apache.org Subject: How to load a big csv to dataframe in Spark 1.6 Hello, I see there is usually this way to load a csv to dataframe: sqlContext = SQLContext(sc) Employee_rdd = sc.textFile("\

How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie

Hello, I see there is usually this way to load a csv to dataframe: sqlContext = SQLContext(sc) Employee_rdd = sc.textFile("\..\Employee.csv") .map(lambda line: line.split(",")) Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name']) Employee_df.show() However in my cas

What's the best practice to load data from RDMS to Spark

2016-12-30 Thread Raymond Xie

Hello, I am new to Spark, as a SQL developer, I only took some courses online and spent some time myself, never had a chance working on a real project. I wonder what would be the best practice (tool, procedure...) to load data (csv, excel) into Spark platform? Thank you. *Raymond*

Anomaly when dealing with Unix timestamp

Re: Best way to process this dataset

Re: Best way to process this dataset

Best way to process this dataset

Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

Re: Error: Could not find or load main class org.apache.spark.launcher.Main

Error: Could not find or load main class org.apache.spark.launcher.Main

spark-shell doesn't start

spark-submit Error: Cannot load main class from JAR file

Re: Not able to sort out environment settings to start spark from windows

Not able to sort out environment settings to start spark from windows

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

Re: How to connect Tableau to databricks spark?

How to connect Tableau to databricks spark?

subsription

Re: Error when loading json to spark

Re: Error when loading json to spark

Re: Error when loading json to spark

From Hive to Spark, what is the default database/table

Re: How to load a big csv to dataframe in Spark 1.6

Re: How to load a big csv to dataframe in Spark 1.6

Re: How to load a big csv to dataframe in Spark 1.6

How to load a big csv to dataframe in Spark 1.6

What's the best practice to load data from RDMS to Spark

28 matches

Site Navigation

Mail list logo

Footer information