Re: specifing schema on dataframe

2017-02-04 Thread Dirceu Semighini Filho
Hi Sam Remove the " from the number that it will work Em 4 de fev de 2017 11:46 AM, "Sam Elamin" escreveu: > Hi All > > I would like to specify a schema when reading from a json but when trying > to map a number to a Double it fails, I tried FloatType and IntType with no > joy! > > > When inferr

Re: [SparkStreaming] 1 SQL tab for each SparkStreaming batch in SparkUI

2016-11-22 Thread Dirceu Semighini Filho
ch microbach. I didn't find a way to use the same hivecontext for all batches. Does anybody know where can I find how to do this? 2016-11-22 14:17 GMT-02:00 Koert Kuipers : > you are creating a new hive context per microbatch? is that a good idea? > > On Tue, Nov 22, 2016 at 8:

[SparkStreaming] 1 SQL tab for each SparkStreaming batch in SparkUI

2016-11-22 Thread Dirceu Semighini Filho
Has anybody seen this behavior (see tha attached picture) in Spark Streaming? It started to happen here after I changed the HiveContext creation to stream.foreachRDD { rdd => val hiveContext = new HiveContext(rdd.sparkContext) } Is this expected? Kind Regards, Dirceu

Re: Duplicated fit into TrainValidationSplit

2016-04-27 Thread Dirceu Semighini Filho
practice - > usually the dataset passed to the train validation split is itself further > split into a training and test set, where the final best model is evaluated > against the test set. > > On Wed, 27 Apr 2016 at 14:30, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wro

Duplicated fit into TrainValidationSplit

2016-04-27 Thread Dirceu Semighini Filho
Hi guys, I was testing a pipeline here, and found a possible duplicated call to fit method into the org.apache.spark.ml.tuning.TrainValidationSplit

Fwd: Null Value in DecimalType column of DataFrame

2015-09-15 Thread Dirceu Semighini Filho
ecimalType(10, 10) will return null, which is >> expected. >> >> On Mon, Sep 14, 2015 at 1:42 PM, Dirceu Semighini Filho < >> dirceu.semigh...@gmail.com> wrote: >> >>> Hi all, >>> I'm moving from spark 1.4 to 1.5, and one of my t

Null Value in DecimalType column of DataFrame

2015-09-14 Thread Dirceu Semighini Filho
Hi all, I'm moving from spark 1.4 to 1.5, and one of my tests is failing. It seems that there was some changes in org.apache.spark.sql.types. DecimalType This ugly code is a little sample to reproduce the error, don't use it into your project. test("spark test") { val file = context.sparkConte

Re: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Dirceu Semighini Filho
t; localhost 55518" succeeded. > > -- > > Don't know how to overcome. Any ideas as applicable to standalone on Mac? > > -- > > Regards > > Naga > > On Thu, Aug 13, 2015 at 11:46 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote:

Re: - Spark 1.4.1 - run-example SparkPi - Failure ...

2015-08-13 Thread Dirceu Semighini Filho
Hi Naga, This happened here sometimes when the memory of the spark cluster wasn't enough, and Java GC enters into an infinite loop trying to free some memory. To fix this I just added more memory to the Workers of my cluster, or you can increase the number of partitions of your RDD, using the repar

Re: How to create a Row from a List or Array in Spark using Scala

2015-03-02 Thread Dirceu Semighini Filho
You can use the parallelize method: val data = List( Row(1, 5, "vlr1", 10.5), Row(2, 1, "vl3", 0.1), Row(3, 8, "vl3", 10.0), Row(4, 1, "vl4", 1.0)) val rdd = sc.parallelize(data) Here I'm using a list of Rows, but you could use it with a list of other kind of object, like this: val x =

Re: Spark performance on 32 Cpus Server Cluster

2015-02-20 Thread Dirceu Semighini Filho
don't wait on each other unless one depends on the other. You'd > have to clarify what you mean by running stages in parallel, like what > are the interdependencies. > > On Fri, Feb 20, 2015 at 10:01 AM, Dirceu Semighini Filho > wrote: > > Hi all, > > I'm

Spark performance on 32 Cpus Server Cluster

2015-02-20 Thread Dirceu Semighini Filho
Hi all, I'm running Spark 1.2.0, in Stand alone mode, on different cluster and server sizes. All of my data is cached in memory. Basically I have a mass of data, about 8gb, with about 37k of columns, and I'm running different configs of an BinaryLogisticRegressionBFGS. When I put spark to run on 9

Re: PSA: Maven supports parallel builds

2015-02-05 Thread Dirceu Semighini Filho
Thanks Nicholas, I didn't knew this. 2015-02-05 22:16 GMT-02:00 Nicholas Chammas : > Y’all may already know this, but I haven’t seen it mentioned anywhere in > our docs on here and it’s a pretty easy win. > > Maven supports parallel builds > < > https://cwiki.apache.org/confluence/display/MAVEN/P

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-03 Thread Dirceu Semighini Filho
Hi Patrick, I work in an Startup and we want make one of our projects as open source. This project is based on Spark, and it will help users to instantiate spark clusters in a cloud environment. But for that project we need to use the repl, hive and thrift-server. Can the decision of not publishing

TimeoutException on tests

2015-01-29 Thread Dirceu Semighini Filho
Hi All, I'm trying to use a local build spark, adding the pr 1290 to the 1.2.0 build and after I do the build, I my tests start to fail. should create labeledpoint *** FAILED *** (10 seconds, 50 milliseconds) [info] java.util.concurrent.TimeoutException: Futures timed out after [1 millisecon

Re: Use mvn to build Spark 1.2.0 failed

2015-01-28 Thread Dirceu Semighini Filho
007 have you figured out how to complete the build? 2015-01-28 13:32 GMT-02:00 Sean Owen : > I don't see how this would relate to the problem in the OP? the > assemblies build fine already as far as I can tell. > > Your new error may be introduced by your change. > > On

Re: Use mvn to build Spark 1.2.0 failed

2015-01-28 Thread Dirceu Semighini Filho
I was facing the same problem, and I fixed it by adding maven-assembly-plugin 2.4.1 assembly/src/main/assembly/assembly.xml in the root pom.xml, following the maven assembly plugin docs

Re: renaming SchemaRDD -> DataFrame

2015-01-27 Thread Dirceu Semighini Filho
lled SchemaRDD to not > break source compatibility for Scala. > > > On Tue, Jan 27, 2015 at 6:28 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> Can't the SchemaRDD remain the same, but deprecated, and be removed in the >> release 1.5(

Re: renaming SchemaRDD -> DataFrame

2015-01-27 Thread Dirceu Semighini Filho
Can't the SchemaRDD remain the same, but deprecated, and be removed in the release 1.5(+/- 1) for example, and the new code been added to DataFrame? With this, we don't impact in existing code for the next few releases. 2015-01-27 0:02 GMT-02:00 Kushal Datta : > I want to address the issue tha

Re: Issue with repartition and cache

2015-01-21 Thread Dirceu Semighini Filho
n String, not an Int. If you're looking to parse and convert it, "toInt" > should be used instead of "asInstanceOf". > > -Sandy > > On Wed, Jan 21, 2015 at 8:43 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> Hi guys,

Issue with repartition and cache

2015-01-21 Thread Dirceu Semighini Filho
Hi guys, have anyone find something like this? I have a training set, and when I repartition it, if I call cache it throw a classcastexception when I try to execute anything that access it val rep120 = train.repartition(120) val cached120 = rep120.cache cached120.map(f => f(1).asInstanceOf[Int]).s

Spark 1.2.0 Repl

2014-12-26 Thread Dirceu Semighini Filho
Hello, Is there any reason in not publishing spark repl in the version 1.2.0? In repl/pom.xml the deploy and publish are been skipped. Regards, Dirceu