R: ML PipelineModel to be scored locally

2016-07-20 Thread Simone
Thanks for your reply. I cannot rely on jpmml due licensing stuff. I can evaluate writing my own prediction code, but I am looking for a more general purpose approach. Any other thoughts? Best Simone - Messaggio originale - Da: "Peyman Mohajerian" Inviato: ‎20/‎07/‎20

Pyspark ML - Unable to finish cross validation

2016-09-26 Thread Simone
ng on yarn using 3 executors, with 4gb and 4 cores each. I am using cache to store dataframes. Unfortunately, my process does not finish and hangs in doing cross validation. Any clues? Thanks guys Simone

Pyspark - 1.5.0 pickle ML PipelineModel

2016-09-29 Thread Simone
. Has anyone solved this issue? Thanks guys Simone

Fwd: Spark standalone workers, executors and JVMs

2016-05-02 Thread Simone Franzini
issues (GC and such). As of Spark 1.4 it is possible to either deploy multiple workers (SPARK_WORKER_INSTANCES + SPARK_WORKER_CORES) or multiple executors per worker (--executor-cores). Which option is preferable and why? Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Re: Spark standalone workers, executors and JVMs

2016-05-04 Thread Simone Franzini
er process actually using and how do I set those? As far as I understand the worker does not need many resources, as it is only spawning up executors. Is that correct? Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Mon, May 2, 2016 at 7:47 PM, Mohammed Guller wrote: > Th

Spark on DSE Cassandra with multiple data centers

2016-05-11 Thread Simone Franzini
, it appears that the in the hadoop command is being ignored and it is trying to connect to cfs: rather than additional_cfs. Anybody else ran into this? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

ML PipelineModel to be scored locally

2016-07-20 Thread Simone Miraglia
Hi all, I am working on the following use case involving ML Pipelines. 1. I created a Pipeline composed from a set of stages 2. I called "fit" method on my training set 3. I validated my model by calling "transform" on my test set 4. I stored my fitted Pipeline to a shared folder Then I have a v

Number of sortBy output partitions

2016-07-21 Thread Simone Franzini
as this is probably due to the way that sortBy is implemented, but I thought I would ask anyway. Should it matter, I am running Spark 1.4.2 (DataStax Enterprise). Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Collecting matrix's entries raises an error only when run inside a test

2017-07-05 Thread Simone Robutti
Hello, I have this problem and Google is not helping. Instead, it looks like an unreported bug and there are no hints to possible workarounds. the error is the following: Traceback (most recent call last): File "/home/simone/motionlogic/trip-labeler/test/trip_labeler_test/model_test.py&q

different behaviour linux/Unix vs windows when load spark context in scala method called from R function using rscala package

2017-08-27 Thread Simone Pallotta
In my R code, I am using rscala package to bridge to a scala method. in scala method I have initialized a spark context to be used later. R code: s <- scala(classpath = "", heap.maximum = "4g") assign("WrappeR",s$.it.wrapper.r.Wrapper) WrappeR$init() where init is a scala function and Wrapp

AVRO specific records

2014-11-05 Thread Simone Franzini
How can I read/write AVRO specific records? I found several snippets using generic records, but nothing with specific records so far. Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Re: AVRO specific records

2014-11-06 Thread Simone Franzini
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Wed, Nov 5, 2014 at 4:24 PM, Laird, Benjamin < benjamin.la...@capitalone.com> wrote: > Something like this works and

Re: AVRO specific records

2014-11-07 Thread Simone Franzini
is that this writes to a plain text file. I need to write to binary AVRO. What am I missing? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Thu, Nov 6, 2014 at 3:15 PM, Simone Franzini wrote: > Benjamin, > > Thanks for the snippet. I have tried using it, but u

Accessing RDD within another RDD map

2014-11-13 Thread Simone Franzini
orm inside the map statement. I am failing to understand what I am doing wrong. Can anyone help with this? Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Declaring multiple RDDs and efficiency concerns

2014-11-14 Thread Simone Franzini
also an efficiency issue or just a stylistic one? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Reading nested JSON data with Spark SQL

2014-11-19 Thread Simone Franzini
ions.GenericRow cannot be cast to scala.collection.immutable.Map How can I read such a field? Am I just missing something small or should I be looking for a completely different alternative to reading JSON? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Re: Reading nested JSON data with Spark SQL

2014-11-19 Thread Simone Franzini
This works great, thank you! Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Wed, Nov 19, 2014 at 3:40 PM, Michael Armbrust wrote: > You can extract the nested fields in sql: SELECT field.nestedField ... > > If you don't do that then nested fields are repre

Re: How can I read this avro file using spark & scala?

2014-11-21 Thread Simone Franzini
you use specific Avro // If you use generic Avro, chill also has a function for that: GenericRecordSerializer kryo.register(classOf[MyAvroClass], AvroSerializer.SpecificRecordBinarySerializer[MyAvroClass]) } } Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Fri, Nov

Kryo NPE with Array

2014-11-25 Thread Simone Franzini
(kryo: Kryo) { kryo.register(...) } } Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Re: Kryo NPE with Array

2014-11-26 Thread Simone Franzini
airly new to Scala and I can't see how I would do this. In the worst case, could I override the newKryo method and put my configuration there? It appears to me that method is the one where the kryo instance is created. Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Tue, No

Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-11-29 Thread Simone Franzini
Did you have a look at my reply in this thread? http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html I am using 1.1.0 though, so not sure if that code would work entirely with 1.0.0, but you can try. Simone Franzini, PhD http

Re: Kryo NPE with Array

2014-12-02 Thread Simone Franzini
class is registered through the Chill AllScalaRegistrar which is called by the Spark Kryo serializer. I thought I'd document this in case somebody else is running into a similar issue. Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Wed, Nov 26, 2014 at 7:40 PM, Simon

Where can you get nightly builds?

2014-12-06 Thread Simone Franzini
I recently read in the mailing list that there are now nightly builds available. However, I can't find them anywhere. Is this really done? If so, where can I get them? Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini

Re: NullPointerException When Reading Avro Sequence Files

2014-12-09 Thread Simone Franzini
here: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491 Maybe there is a simpler solution to your problem but I am not that much of an expert yet. I hope this helps. Simone Franzini, PhD http://www.linkedin.com/in

Re: NullPointerException When Reading Avro Sequence Files

2014-12-09 Thread Simone Franzini
You can use this Maven dependency: com.twitter chill-avro 0.4.0 Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro < cristovao.corde...@cern.ch> wrote: > Thanks for the reply! > > I&#

Re: NullPointerException When Reading Avro Sequence Files

2014-12-15 Thread Simone Franzini
To me this looks like an internal error to the REPL. I am not sure what is causing that. Personally I never use the REPL, can you try typing up your program and running it from an IDE or spark-submit and see if you still get the same error? Simone Franzini, PhD http://www.linkedin.com/in

updateStateByKey: cleaning up state for keys not in current window

2015-01-09 Thread Simone Franzini
this case? Or, in other words, how can I clear the state for a key when Seq[V] is empty? Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini