Hi,
I am getting the following Warning when i run the pyspark job:
My Code is
mat = RowMatrix(tf_rdd_vec.cache()) # RDD is cached
svd = mat.computeSVD(num_topics, computeU=False)
I am using Ubuntu 16.04 EC2 instance. And I have installed all following
libraries into my system.
sudo apt insta
Hi Users,
Is there any way to avoid creation of .crc files when writing an RDD with
saveAsTextFile method?
My use case is, I have mounted S3 on the local file system using S3FS and
saving an RDD to mounting point. by looking at S3, I found one .crc file for
each part file and even _SUCCESS file.
Does you get the warning info such as:
`Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS`
`Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS` ?
These two errors are thrown in `com.github.fommil.netlib.BLAS`, but it
catch the original exception
Hi All,
I am wondering if spark supports Dataset>> ?
when I do the following it says no map function available?
Dataset>> resultDs = ds.map(lambda,
Encoders.bean(List.class));
Thanks!
it supports Dataset>> where X must be a supported type
also. Object is not a supported type.
On Mon, Oct 9, 2017 at 7:36 AM, kant kodali wrote:
> Hi All,
>
> I am wondering if spark supports Dataset>> ?
>
> when I do the following it says no map function available?
>
> Dataset>> resultDs = ds.ma
### Issue description
We have an issue with data consistency when storing data in Elasticsearch
using Spark and elasticsearch-spark connector. Job finishes successfully,
but when we compare the original data (stored in S3), with the data stored
in ES, some documents are not present in Elasticsearc
Have you raised it in ES connector github as issues? In my past experience
(with hadoop connector with Pig), they respond pretty quickly.
On Tue, Oct 10, 2017 at 12:36 AM, sixers wrote:
> ### Issue description
>
> We have an issue with data consistency when storing data in Elasticsearch
> using
Hi all!
I would love to use Spark with a somewhat more modern logging framework
than Log4j 1.2. I have Logback in mind, mostly because it integrates well
with central logging solutions such as the ELK stack. I've read up a bit on
getting Spark 2.0 (that's what I'm using currently) to work with any
Hi Koert,
Thanks! If I have this Dataset>> what would be the
Enconding?is it Encoding.kryo(Seq.class) ?
Also shouldn't List be supported? Should I create a ticket for this?
On Mon, Oct 9, 2017 at 6:10 AM, Koert Kuipers wrote:
> it supports Dataset>> where X must be a supported type
> also. O
if you are willing to use kryo encoder you can do your original Dataset<
List>>> i think
for example in scala i create here an intermediate Dataset[Any]:
scala> Seq(1,2,3).toDS.map(x => if (x % 2 == 0) x else
x.toString)(org.apache.spark.sql.Encoders.kryo[Any]).map{ (x: Any) => x
match { case i:
Tried the following.
dataset.map(new MapFunction>>() {
@Override
public List> call(String input) throws Exception {
List> temp = new ArrayList<>();
temp.add(new HashMap());
return temp;
}
}, Encoders.kryo(List.class));
This doesn't even compile.
error: no s
https://issues.apache.org/jira/browse/SPARK-8
On Sun, Oct 8, 2017 at 11:58 AM, kant kodali wrote:
> I have the following so far
>
> private StructType getSchema() {
> return new StructType()
> .add("name", StringType)
> .add("address", StringType)
> .a
Any changes in the Java code (to be specific, the generated bytecode) in
the functions you pass to Spark (i.e., map function, reduce function, as
well as it closure dependencies) counts as "application code change", and
will break the recovery from checkpoints.
On Sat, Oct 7, 2017 at 11:53 AM, Joh
Hi,
I am trying to deploy a Spark app in a Kubernetes Cluster. The cluster consists
of 2 machines - 1 master and 1 slave, each of them with the following config:
RHEL 7.2
Docker 17.03.1
K8S 1.7.
I am following the steps provided in
https://apache-spark-on-k8s.github.io/userdocs/running-on-kuber
Hi,
I'm new to spark and big data, we are doing some poc and building our
warehouse application using Spark. Can any one share with me guidance
like Naming Convention for HDFS Name,Table Names, UDF and DB Name. Any
sample architecture diagram.
-Mahens
15 matches
Mail list logo