Hello,
I have a dataframe, apply from_unixtime seems to expose an anomaly:
*scala> val bhDF4 = bhDF.withColumn("ts1", $"ts" + 28800).withColumn("ts2",
from_unixtime($"ts" + 28800,"MMddhhmmss"))*
*bhDF4: org.apache.spark.sql.DataFrame = [user_id: int, item_id: int ... 5
more fields]*
*scala>
ser is based on univocity and you might use the
> "spark.read.csc" syntax instead of using the rdd api;
>
> From my experience, this will better than any other csv parser
>
> 2018-06-19 16:43 GMT+02:00 Raymond Xie :
>
>> Thank you Matteo, Askash and Georg:
>>
>&
> wrote:
>>
>>> use pandas or dask
>>>
>>> If you do want to use spark store the dataset as parquet / orc. And then
>>> continue to perform analytical queries on that dataset.
>>>
>>> Raymond Xie schrieb am Di., 19. Juni 2018 um
I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment is
20GB ssd harddisk and 2GB RAM.
The dataset comes with
User ID: 987,994
Item ID: 4,162,024
Category ID: 9,439
Behavior type ('pv', 'buy', 'cart', 'fav')
Unix Timestamp: span between November 25 to December 03, 2017
I would
e
>
> On Jun 17, 2018, at 2:32 PM, Raymond Xie wrote:
>
> Hello,
>
> I am wondering how can I run spark job in my environment which is a single
> Ubuntu host with no hadoop installed? if I run my job like below, I will
> end up with infinite loop at the end. Thank you very
Hello,
I am wondering how can I run spark job in my environment which is a single
Ubuntu host with no hadoop installed? if I run my job like below, I will
end up with infinite loop at the end. Thank you very much.
rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder
--conf spark.
t;
>
> Best Regards,
>
> Vamshi T
>
>
> --
> *From:* Raymond Xie
> *Sent:* Sunday, June 17, 2018 6:27 AM
> *To:* user; Hui Xie
> *Subject:* Error: Could not find or load main class
> org.apache.spark.launcher.Main
>
> Hello,
>
> I
Hello,
It would be really appreciated if anyone can help sort it out the following
path issue for me? I highly doubt this is related to missing path setting
but don't know how can I fix it.
rxie@ubuntu:~/Downloads/spark$ echo $PATH
/usr/bin/java:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
Hello, I am doing the practice in Ubuntu now, here is the error I am
encountering:
rxie@ubuntu:~/Downloads/spark/bin$ spark-shell
Error: Could not find or load main class org.apache.spark.launcher.Main
What am I missing?
Thank you very much.
Java is installed.
*--
Hello, I am doing the practice in windows now.
I have the jar file generated under:
C:\RXIE\Learning\Scala\spark2practice\target\scala-2.
11\spark2practice_2.11-0.1.jar
The package name is Retail_db and the object is GetRevenuePerOrder.
The spark-submit command is:
spark-submit retail_db.GetReve
ome path .
> May be spacial char or space on ur path.
>
> Regards,
> Vaquar khan
>
> On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote:
>
>> I am trying to run spark-shell in Windows but receive error of:
>>
>> \Java\jre1.8.0_151\bin\java was unexpected at
I am trying to run spark-shell in Windows but receive error of:
\Java\jre1.8.0_151\bin\java was unexpected at this time.
Environment:
System variables:
SPARK_HOME:
c:\spark
Path:
C:\Program Files (x86)\Common
Files\Oracle\Java\javapath;C:\ProgramData\Anaconda2;C:\ProgramData\Anaconda2\Librar
wrote:
> Try to use --packages to include the jars. From error it seems it's
> looking for main class in jars but u r running a python script...
>
> On 25 Feb 2017 10:36 pm, "Raymond Xie" wrote:
>
> That's right Anahita, however, the class name is
PM, Anahita Talebi
wrote:
> You're welcome.
> You need to specify the class. I meant like that:
>
> spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"
>
>
>
>
ed it by removing --jars.
>
> Cheers,
> Anahita
>
> On Saturday, February 25, 2017, Raymond Xie wrote:
>
>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>> now, can anyone tell me what's wrong with the following code and the
>> except
I am doing a spark streaming on a hortonworks sandbox and am stuck here
now, can anyone tell me what's wrong with the following code and the
exception it causes and how do I fix it? Thank you very much in advance.
spark-submit --jars
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-124
nday, January 9, 2017 at 2:59 PM
> *To: *Raymond Xie , user
> *Subject: *Re: How to connect Tableau to databricks spark?
>
>
>
> Hi Raymond,
>
>
>
> Are you using a Spark 2.0 or 1.6 cluster? With Spark 2.0 it’s just a
> matter of entering the hostname of your
I want to do some data analytics work by leveraging Databricks spark
platform and connect my Tableau desktop to it for data visualization.
Does anyone ever make it? I've trying to follow the instruction below but
not successful?
https://docs.cloud.databricks.com/docs/latest/databricks_guide/01%20
**
*Sincerely yours,*
*Raymond*
e", StringType)
> val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema)
>
> But somehow i seem to remember that there was a way , in Spark 2.0, so
> that Spark will infer the schema for you..
>
> hth
> marco
>
>
>
>
>
> On Sun, Jan 1, 2017 a
urs,*
*Raymond*
On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales
wrote:
> Looks like it's trying to treat that path as a folder, try omitting
> the file name and just use the folder path.
>
> On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie wrote:
> > Happy new year!!!
&g
n(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
>>>
**
*Sincerely yours,*
*Raymond*
On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales
wrote:
> Looks like it's trying to treat that path as a folder, try omitting
> the file n
Hello,
It is indicated in
https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#dataframes
when Running SQL Queries Programmatically you can do:
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)df =
sqlContext.sql("SELECT * FROM table")
However, it did not indicate what sh
ote:
> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> ------
> *From:* Raymond Xie
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *To:* user@spark.apache.org
> *Subject:* How to load a
Thanks Felix, I will try it tomorrow
~~~sent from my cell phone, sorry if there is any typo
2016年12月30日 下午10:08,"Felix Cheung" 写道:
> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> --------
- Original message ----
From: Raymond Xie
Date: 31/12/2016 10:46 (GMT+08:00)
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6
Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)
Employee_rdd = sc.textFile("\
Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)
Employee_rdd = sc.textFile("\..\Employee.csv")
.map(lambda line: line.split(","))
Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
Employee_df.show()
However in my cas
Hello,
I am new to Spark, as a SQL developer, I only took some courses online and
spent some time myself, never had a chance working on a real project.
I wonder what would be the best practice (tool, procedure...) to load data
(csv, excel) into Spark platform?
Thank you.
*Raymond*
28 matches
Mail list logo