Hi Marco,
Thanks!
Please have my response:
so you have a pyspark application running on spark 2.0Palash>> Yes
You have python scripts dropping files on HDFSPalash>> Yes (it is not part of
spark process, just independent python script)
then you have two spark jobPalash>> Yes
- 1 load expected hour
Hi
We just tested a switch from Spark 2.0.2 to Spark 2.1.0 on our codebase. It
compiles fine, but introduces the following runtime exception upon
initialization of our Cassandra database. I can't find any clues in the
release notes. Has anyone experienced this?
Morten
sbt.ForkMain$ForkError: jav
Hi.
you have a DataFrame.. there should be either a way to
- convert a DF to a Vector without doing a cast
- use a ML library which relies to DataFrames only
I can see that your code is still importing libraries from two different
'machine learning ' packages
import org.apache.spark.ml.featur
This may also help:
http://spark.apache.org/docs/latest/ml-migration-guides.html
On Sat, Dec 31, 2016 at 6:51 AM, Marco Mistroni wrote:
> Hi.
> you have a DataFrame.. there should be either a way to
> - convert a DF to a Vector without doing a cast
> - use a ML library which relies to DataFr
Hello Felix,
I followed the instruction and ran the command:
> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0
and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From xie1/
192.168.112.150 to localhost:9000 failed on
In Pyspark 2 loading file wtih any delimiter into Dataframe is pretty
straightforward
spark.read.csv(file, schema=, sep='|')
Is there something similar in Spark 2 in Scala! spark.read.csv(path,
sep='|')?
Hmm this would seem unrelated? Does it work on the same box without the
package? Do you have more of the error stack you can share?
_
From: Raymond Xie mailto:xie3208...@gmail.com>>
Sent: Saturday, December 31, 2016 8:09 AM
Subject: Re: How to load a big csv to datafr
Hello,
It is indicated in
https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#dataframes
when Running SQL Queries Programmatically you can do:
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)df =
sqlContext.sql("SELECT * FROM table")
However, it did not indicate what sh
flight201601 is the name of database or schema it is not a TABLE!
In Hive you can do
show databases
to see the list of databases. By Default Hive has a default database called
"default" out of box
For example to see list of tables in database flight201601 do the following:
use flight201601;
sh
See the documentation for the options given to the csv function:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame
The options can be passed with the option/options functions to the
DataFrameReader class
Looks like it's trying to treat that path as a folder, try omitting
the file name and just use the folder path.
On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie wrote:
> Happy new year!!!
>
> I am trying to load a json file into spark, the json file is attached here.
>
> I received the following erro
11 matches
Mail list logo