?
Thanks
--
Cesar Flores
I created a spark application in Eclipse by including the
spark-assembly-1.6.0-hadoop2.6.0.jar file in the path.
However, this method does not allow me see spark code. Is there an easy way
to include spark source code for reference in an application developed in
Eclipse?
Thanks !
--
Cesar
Please sent me to me too !
Thanks ! ! !
Cesar Flores
On Tue, May 17, 2016 at 4:55 PM, Femi Anthony wrote:
> Please send it to me as well.
>
> Thanks
>
> Sent from my iPhone
>
> On May 17, 2016, at 12:09 PM, Raghavendra Pandey <
> raghavendra.pan...@gmail.com>
functionality may be
useful?*
Thanks
--
Cesar Flores
gt; >>
>>> >> From: kpe...@gmail.com
>>> >> Date: Mon, 2 May 2016 12:11:18 -0700
>>> >> Subject: Re: Weird results with Spark SQL Outer joins
>>> >> To: gourav.sengu...@gmail.com
>>> >> CC: user@spark.apache.org
>>> >>
>>> >>
>>> >> Gourav,
>>> >>
>>> >> I wish that was case, but I have done a select count on each of the
>>> two
>>> >> tables individually and they return back different number of rows:
>>> >>
>>> >>
>>> >> dps.registerTempTable("dps_pin_promo_lt")
>>> >> swig.registerTempTable("swig_pin_promo_lt")
>>> >>
>>> >>
>>> >> dps.count()
>>> >> RESULT: 42632
>>> >>
>>> >>
>>> >> swig.count()
>>> >> RESULT: 42034
>>> >>
>>> >> On Mon, May 2, 2016 at 11:55 AM, Gourav Sengupta
>>> >> wrote:
>>> >>
>>> >> This shows that both the tables have matching records and no
>>> mismatches.
>>> >> Therefore obviously you have the same results irrespective of whether
>>> you
>>> >> use right or left join.
>>> >>
>>> >> I think that there is no problem here, unless I am missing something.
>>> >>
>>> >> Regards,
>>> >> Gourav
>>> >>
>>> >> On Mon, May 2, 2016 at 7:48 PM, kpeng1 wrote:
>>> >>
>>> >> Also, the results of the inner query produced the same results:
>>> >> sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc ,
>>> d.account
>>> >> AS
>>> >> d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend ,
>>> >> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s INNER JOIN
>>> >> dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND
>>> s.ad
>>> >> =
>>> >> d.ad) WHERE s.date >= '2016-01-03'AND d.date >=
>>> '2016-01-03'").count()
>>> >> RESULT:23747
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Weird-results-with-Spark-SQL-Outer-joins-tp26861p26863.html
>>> >> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >>
>>> >> -
>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>> >>
>>> >>
>>> >>
>>> >
>>>
>>
>>
>
--
Cesar Flores
Thanks Ted:
That is the kind of answer I was looking for.
Best,
Cesar flores
On Wed, Apr 6, 2016 at 3:01 PM, Ted Yu wrote:
> Have you looked at SparkListener ?
>
> /**
>* Called when the driver registers a new executor.
>*/
> def onExecutorA
Hello:
I wonder if there is a way to query the number of running executors (nor
the number asked executors) inside a spark job?
Thanks
--
Cesar Flores
the config parameter
spark.sql.shuffle.partitions, which I need to modify on the fly to do group
by clauses depending on the size of my input.*
Thanks
--
Cesar Flores
of time (i.e. less than 12 hours).
Best
--
Cesar Flores
I found my problem. I was calling setParameterValue(defaultValue) more than
one time in the hierarchy of my classes.
Thanks!
On Mon, Feb 15, 2016 at 6:34 PM, Cesar Flores wrote:
>
> I have a set of transformers (each with specific parameters) in spark
> 1.3.1. I have two versions,
.*
*Does anyone have any idea of what I may be doing wrong. My guess is that I
am doing something weird in my class hierarchy but can not figure out what.*
Thanks!
--
Cesar Flores
are better off not
>> running the orderBy clause.
>>
>> May be someone from spark sql team could answer that how should the
>> partitioning of the output DF be handled when doing an orderBy?
>>
>> Hemant
>> www.snappydata.io
>> https://github.com/Snappy
with a single
partition and around 14 million records
val newDF = hc.createDataFrame(rdd, df.schema)
This process is really slow. Is there any other way of achieving this task,
or to optimize it (perhaps tweaking a spark configuration parameter)?
Thanks a lot
--
Cesar Flores
very useful for
performing joins later). Is that true?
And second question, if I save *df* just after the query into a hive table,
when I reload this table from hive, does spark will remember the
partitioning?
I am using at the moment 1.3.1 spark version.
Thanks
--
Cesar Flores
ing since its
> mostly a blackbox.
>
> 1) could be fixed by adding caching. 2) is on our roadmap (though you'd
> have to use logical DataFrame expressions to do the partitioning instead of
> a class based partitioner).
>
> On Wed, Oct 14, 2015 at 8:45 AM, Cesar Flores wro
x._2)
val partitioned_df = hc.createDataFrame(partitioned_rdd,
unpartitioned_df.schema)
Thanks a lot
--
Cesar Flores
to merge is random?
Thanks
--
Cesar Flores
linux path /home/my_user_name, which fails.
On Thu, Aug 6, 2015 at 3:12 PM, Cesar Flores wrote:
> Well, I try this approach, and still have issues. Apparently TestHive can
> not delete the hive metastore directory. The complete error that I have is:
>
> 15/08/06 15:01:29 ERROR Dr
On Mon, Aug 3, 2015 at 5:56 PM, Michael Armbrust
wrote:
> TestHive takes care of creating a temporary directory for each invocation
> so that multiple test runs won't conflict.
>
> On Mon, Aug 3, 2015 at 3:09 PM, Cesar Flores wrote:
>
>>
>> We are using a local h
ooks like:
libraryDependencies += "org.scalatest" % "scalatest_2.10" % "2.0" % "test",
parallelExecution in Test := false,
fork := true,
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M",
"-XX:+CMSClassUnloadingEnabled")
We are working under Spark 1.3.0
Thanks
--
Cesar Flores
Thanks!!!
--
Cesar Flores
tried also:
hc.createDataFrame(df.rdd.repartition(100),df.schema)
which appears to be a random permutation. Can some one confirm me that the
last line is in fact a random permutation, or point me out to a better
approach?
Thanks
--
Cesar Flores
cumsum column as the next one:
flag | price | cumsum_price
--|---
1|47.808764653746 | 47.808764653746
1|47.808764653746 | 95.6175293075
1|31.9869279512204| 127.604457259
Thanks
--
Cesar Flores
as the next one:
flag | price | index
--|---
1|47.808764653746 | 0
1|47.808764653746 | 1
1|31.9869279512204| 2
1|47.7907893713564| 3
1|16.7599200038239| 4
1|16.7599200038239| 5
1|20.3916014172137| 6
--
Cesar Flores
I have a table in a Hive database partitioning by date. I notice that when
I query this table using HiveContext the created data frame has an specific
number of partitions.
Do this partitioning corresponds to my original table partitioning in Hive?
Thanks
--
Cesar Flores
column to something else on the
fly, and not after performing the aggregation?
thanks
--
Cesar Flores
Never mind. I found the solution:
val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd,
hiveLoadedDataFrame.schema)
which translate to convert the data frame to rdd and back again to data
frame. Not the prettiest solution, but at least it solves my problems.
Thanks,
Cesar Flores
On
at all my fields are missing.
Can someone tell me if I need to do some post processing after loading from
hive in order to avoid this kind of errors?
Thanks
--
Cesar Flores
a lot
--
Cesar Flores
transformers classes for feature extraction, and If I need to save the
input and maybe output SchemaRDD of the transform function in every
transformer, this may not very efficient.
Thanks
On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer wrote:
> Hi,
>
> On Tue, Mar 10, 2015 at 2:13 PM, Ces
different syntax? Are they interchangeable? Which one has
better performance?
Thanks a lot
--
Cesar Flores
be able to handle user defined classes too? Do user classes will
need to extend they will need to define the same approach?
--
Cesar Flores
uired fields, but would like to hear the opinion
of an expert about it.
Thanks
On Thu, Feb 19, 2015 at 12:01 PM, Cesar Flores wrote:
>
> I am trying to pass a variable number of arguments to the select function
> of a SchemaRDD I created, as I want to select the fields in run time:
ct
function? If not, what will be a better approach for selecting the required
fields in run time?
Thanks in advance for your help
--
Cesar Flores
private to the ml package:
private[ml] def transformSchema(schema: StructType, paramMap: ParamMap):
StructType
Do any user can create their own transformers? If not, do this
functionality will be added in the future.
Thanks
--
Cesar Flores
35 matches
Mail list logo