I have worked it out, just let java call scala class function .Thank Xiaomeng a
lot~~
On Friday, November 25, 2016 1:50 AM, Xiaomeng Wan
wrote:
here is the scala code I use to get the best model, I never used java
val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(ne
Hi All,
I need to print auc and prc for GBTClassifier model, it seems okay for
RandomForestClassifier but not GBTClassifier, though rawPrediction column is
neither in original data.
the codes are :
.. // Set up Pipeline val stages
= new mutable.Arra
Hi,
Can anyone tell me what is causing this error
Spark 2.0.0
Python 2.7.5
df = sqlContext.createDataFrame(foo, schema)
https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a
Traceback (most recent call last):
File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",
li
I get a slight different error when not specifying a schema:
Traceback (most recent call last):
File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",
line 61, in
df = sqlContext.createDataFrame(foo)
File
"/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/sql/co
Hi
pickle erros normally point to serialisation issue. i am suspecting
something wrong with ur S3 data , but is just a wild guess...
Is your s3 object publicly available?
few suggestions to nail down the problem
1 - try to see if you can read your object from s3 using boto3 library
'offline',
On 27 Nov 2016, at 02:55, kant kodali
mailto:kanth...@gmail.com>> wrote:
I would say instead of LD_LIBRARY_PATH you might want to use java.library.path
in the following way
java -Djava.library.path=/path/to/my/library or pass java.library.path along
with spark-submit
This is only going to s
Hi Takeshi,
Thank you for your comment. I changed it to RDD and it's a lot better.
Zhuo
On Fri, Nov 25, 2016 at 7:04 PM, Takeshi Yamamuro
wrote:
> Hi,
>
> I think this is just the overhead to represent nested elements as internal
> rows on-runtime
> (e.g., it consumes null bits for each nested
I've been toying around with Spark SQL lately and trying to move some
workloads from Hive. In the hive world the partitions below are recovered
on an ALTER TABLE RECOVER PARTITIONS
*Path:*
s3://bucket-company/path/2016/03/11
s3://bucket-company/path/2016/03/12
s3://bucket-company/path/2016/03/13
Hi team,
I am using Apache spark 1.6.1 version. In this I am writing Spark SQL queries.
I found 2 ways of writing SQL queries. One is by simple SQL syntax and other is
by using spark Dataframe functions.
I need to execute if conditions by using dataframe functions. Please specify
how can I do th
Use the when() and otherwise() functions. For example:
import org.apache.spark.sql.functions._
val rows = Seq(("bob", 1), ("lucy", 2), ("pat", 3)).toDF("name", "genderCode")
rows.show
++--+
|name|genderCode|
++--+
| bob| 1|
|lucy| 2|
| pat| 3|
+--
Hi Everyone,
Does anyone know what is the best practise of writing parquet file from
Spark ?
As Spark app write data to parquet and it shows that under that directory
there are heaps of very small parquet file (such as
e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only
15KB
Generally, yes - you should try to have larger data sizes due to the
overhead of opening up files. Typical guidance is between 64MB-1GB;
personally I usually stick with 128MB-512MB with the default of snappy
codec compression with parquet. A good reference is Vida Ha's
presentation Data
Storage T
I tried this, but it is throwing an error that the method "when" is not
applicable.
I am doing this in Java instead of scala.
Note:- I am using spark 1.6.1 version.
-Original Message-
From: Stuart White [mailto:stuart.whi...@gmail.com]
Sent: Monday, November 28, 2016 10:26 AM
To: Hitesh
Prasanna,
AFAIK spark does not handle folders without partition column names in them
and there is no way to get spark to do it.
I think the reason for this is that parquet file hierarchies had this info
and historically spark deals more with those.
On Mon, Nov 28, 2016 at 9:48 AM, Prasanna Santhan
Hi,
Component: Spark R
Level: Beginner
Scenario: Does Spark R supports nonlinear optimization with nonlinear
constraints?
Our business application supports two types of function convex and S-shaped
curves and linear & non-linear constraints. These constraints can be combined
with any one type
15 matches
Mail list logo