basically, you can build snapshot by yourself.
just clone the source code, and then 'mvn package/deploy/install…..’
Azuryy Yu
> On Sep 22, 2015, at 13:36, Bin Wang wrote:
>
> However I find some scripts in dev/audit-release, can I use them?
>
> Bin Wang mailto:wbi...@gmail.com>>于2015年9月22日
I just added snapshot builds for 1.5. They will take a few hours to
build, but once we get them working should publish every few hours.
https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging
- Patrick
On Mon, Sep 21, 2015 at 10:36 PM, Bin Wang wrote:
> However I find some scripts in dev/au
However I find some scripts in dev/audit-release, can I use them?
Bin Wang 于2015年9月22日周二 下午1:34写道:
> No, I mean push spark to my private repository. Spark don't have a
> build.sbt as far as I see.
>
> Fengdong Yu 于2015年9月22日周二 下午1:29写道:
>
>> Do you mean you want to publish the artifact to your pr
No, I mean push spark to my private repository. Spark don't have a
build.sbt as far as I see.
Fengdong Yu 于2015年9月22日周二 下午1:29写道:
> Do you mean you want to publish the artifact to your private repository?
>
> if so, please using ‘sbt publish’
>
> add the following in your build.sb:
>
> publishTo
Do you mean you want to publish the artifact to your private repository?
if so, please using ‘sbt publish’
add the following in your build.sb:
publishTo := {
val nexus = "https://YOUR_PRIVATE_REPO_HOSTS/";
if (version.value.endsWith("SNAPSHOT"))
Some("snapshots" at nexus + "content/repos
My project is using sbt (or maven), which need to download dependency from
a maven repo. I have my own private maven repo with nexus but I don't know
how to push my own build to it, can you give me a hint?
Mark Hamstra 于2015年9月22日周二 下午1:25写道:
> Yeah, whoever is maintaining the scripts and snapsho
Yeah, whoever is maintaining the scripts and snapshot builds has fallen
down on the job -- but there is nothing preventing you from checking out
branch-1.5 and creating your own build, which is arguably a smarter thing
to do anyway. If I'm going to use a non-release build, then I want the
full git
But I cannot find 1.5.1-SNAPSHOT either at
https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.10/
Mark Hamstra 于2015年9月22日周二 下午12:55写道:
> There is no 1.5.0-SNAPSHOT because 1.5.0 has already been released. The
> current head of branch-1.5 is 1.5.1-SNAPSHOT -- so
There is no 1.5.0-SNAPSHOT because 1.5.0 has already been released. The
current head of branch-1.5 is 1.5.1-SNAPSHOT -- soon to be 1.5.1 release
candidates and then the 1.5.1 release.
On Mon, Sep 21, 2015 at 9:51 PM, Bin Wang wrote:
> I'd like to use some important bug fixes in 1.5 branch and I
I'd like to use some important bug fixes in 1.5 branch and I look for the
apache maven host, but don't find any snapshot for 1.5 branch.
https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.10/1.5.0-SNAPSHOT/
I can find 1.4.X and 1.6.0 versions, why there is no snap
Hossein,
Any strong reason to download and install SparkR source package separately from
the Spark distribution?
An R user can simply download the spark distribution, which contains SparkR
source and binary package, and directly use sparkR. No need to install SparkR
package at all.
From: Hosse
I was executing on Spark 1.4 so I didn¹t notice the Tungsten option would
make spilling happen in 1.5. I¹ll upgrade to 1.5 and see how that turns out.
Thanks!
From: Reynold Xin
Date: Monday, September 21, 2015 at 5:36 PM
To: Matt Cheah
Cc: "dev@spark.apache.org" , Mingyu Kim
, Peter Faiman
Hi dev list,
SparkR backend assumes SparkR source files are located under
"SPARK_HOME/R/lib/." This directory is created by running R/install-dev.sh.
This setting makes sense for Spark developers, but if an R user downloads
and installs SparkR source package, the source files are going to be in
pl
What's the plan if you run explain?
In 1.5 the default should be TungstenAggregate, which does spill (switching
from hash-based aggregation to sort-based aggregation).
On Mon, Sep 21, 2015 at 5:34 PM, Matt Cheah wrote:
> Hi everyone,
>
> I’m debugging some slowness and apparent memory pressure
Hi everyone,
I¹m debugging some slowness and apparent memory pressure + GC issues after I
ported some workflows from raw RDDs to Data Frames. In particular, I¹m
looking into an aggregation workflow that computes many aggregations per key
at once.
My workflow before was doing a fairly straightforw
Oh, I want to modify existing Hadoop InputFormat.
On Mon, Sep 21, 2015 at 4:23 PM, Ted Yu wrote:
> Can you clarify what you want to do:
> If you modify existing hadoop InputFormat, etc, it would be a matter of
> rebuilding hadoop and build Spark using the custom built hadoop as
> dependency.
>
>
Thanks Josh, I should have added that we've tried with -DwildcardSuites
and Maven and we use this helpful feature regularly (although this does
result in building plenty of tests and running other tests in other
modules too), so wondering if there's a more "streamlined" way - e.g. with
junit an
Can you clarify what you want to do:
If you modify existing hadoop InputFormat, etc, it would be a matter of
rebuilding hadoop and build Spark using the custom built hadoop as
dependency.
Do you introduce new InputFormat ?
Cheers
On Mon, Sep 21, 2015 at 1:20 PM, Dogtail Ray wrote:
> Hi all,
>
Hi all,
I find that Spark uses some Hadoop APIs such as InputFormat, InputSplit,
etc., and I want to modify these Hadoop APIs. Do you know how can I
integrate my modified Hadoop code into Spark? Great thanks!
+dev list
Hi Dirceu,
The answer to whether throwing an exception is better or null is better
depends on your use case. If you are debugging and want to find bugs with
your program, you might prefer throwing an exception. However, if you are
running on a large real-world dataset (i.e. data is dirt
For quickly running individual suites:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-RunningIndividualTests
On Mon, Sep 21, 2015 at 8:21 AM, Adam Roberts wrote:
> Hi, is there an existing way to blacklist any test suite?
>
> Ideally we'd have a tex
Looks like the problem is df.rdd does not work very well with limit. In
scala, df.limit(1).rdd will also trigger the issue you observed. I will add
this in the jira.
On Mon, Sep 21, 2015 at 10:44 AM, Jerry Lam wrote:
> I just noticed you found 1.4 has the same issue. I added that as well in
> th
I just noticed you found 1.4 has the same issue. I added that as well in
the ticket.
On Mon, Sep 21, 2015 at 1:43 PM, Jerry Lam wrote:
> Hi Yin,
>
> You are right! I just tried the scala version with the above lines, it
> works as expected.
> I'm not sure if it happens also in 1.4 for pyspark bu
Hi Yin,
You are right! I just tried the scala version with the above lines, it
works as expected.
I'm not sure if it happens also in 1.4 for pyspark but I thought the
pyspark code just calls the scala code via py4j. I didn't expect that this
bug is pyspark specific. That surprises me actually a bi
Seems 1.4 has the same issue.
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
> btw, does 1.4 has the same problem?
>
> On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
>
>> Hi Jerry,
>>
>> Looks like it is a Python-specific issue. Can you create a JIRA?
>>
>> Thanks,
>>
>> Yin
>>
>> On Mon,
To unsubscribe from the dev list, please send a message to
dev-unsubscr...@spark.apache.org as described here:
http://spark.apache.org/community.html#mailing-lists.
Thanks,
-Rick
Dulaj Viduranga wrote on 09/21/2015 10:15:58 AM:
> From: Dulaj Viduranga
> To: dev@spark.apache.org
> Date: 09/21/
Unsubscribe
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org
quick update: we actually did some of the maintenance on our systems
after the berkeley-wide outage caused by one of our (non-jenkins)
servers halting and catching fire.
we'll still have some downtime early wednesday, but tomorrow's will be
cancelled. i'll send out another update real soon now w
btw, does 1.4 has the same problem?
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
> Hi Jerry,
>
> Looks like it is a Python-specific issue. Can you create a JIRA?
>
> Thanks,
>
> Yin
>
> On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
>
>> Hi Spark Developers,
>>
>> I just ran some very s
Hi Jerry,
Looks like it is a Python-specific issue. Can you create a JIRA?
Thanks,
Yin
On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
> Hi Spark Developers,
>
> I just ran some very simple operations on a dataset. I was surprise by the
> execution plan of take(1), head() or first().
>
> Fo
Hi Spark Developers,
I just ran some very simple operations on a dataset. I was surprise by the
execution plan of take(1), head() or first().
For your reference, this is what I did in pyspark 1.5:
df=sqlContext.read.parquet("someparquetfiles")
df.head()
The above lines take over 15 minutes. I wa
Hi, is there an existing way to blacklist any test suite?
Ideally we'd have a text file with a series of names (let's say comma
separated) and if a name matches with the fully qualified class name for a
suite, this suite will be skipped.
Perhaps we can achieve this via ScalaTest or Maven?
Curr
Thanks Corey for the suggestion , will check it
On Mon, Sep 21, 2015 at 2:43 PM, Corey Nolet wrote:
> Mohamed,
>
> Have you checked out the Spark Timeseries [1] project? Non-seasonal ARIMA
> was added to this recently and seasonal ARIMA should be following shortly.
>
> [1] https://github.com/clo
Mohamed,
Have you checked out the Spark Timeseries [1] project? Non-seasonal ARIMA
was added to this recently and seasonal ARIMA should be following shortly.
[1] https://github.com/cloudera/spark-timeseries
On Mon, Sep 21, 2015 at 7:47 AM, Mohamed Baddar
wrote:
> Hello everybody , this my firs
Hello everybody , this my first mail in the List , and i would like to
introduce my self first :)
My Name is Mohamed baddar , I work as Big Data and Analytics Software
Engieer at BADRIT (http://badrit.com/) , a software Startup with focus in
Big Data , also i have been working for 6+ years at IBM R
You can use broadcast variable for passing connection information.
Cheers
> On Sep 21, 2015, at 4:27 AM, Priya Ch wrote:
>
> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records in
> rdd. Hence I need sparkContext to read from
sparkConext is available on the driver, not on executors.
To read from Cassandra, you can use something like this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md
*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com
On Mon, Sep 21, 2015 at 2:27 PM, Priya
can i use this sparkContext on executors ??
In my application, i have scenario of reading from db for certain records
in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
If sparkContext couldn't be sent to executors , what is the workaround for
this ??
On Mon, Sep 21, 2
What new information do you know after creating the RDD, that you didn't
know at the time of it's creation?
I think the whole point is that RDD is immutable, you can't change it once
it was created.
Perhaps you need to refactor your logic to know the parameters earlier, or
create a whole new RDD ag
Thanks for the prompt reply.
May I ask why the keyBy(f) is not supported in DStreams? any particular
reason?
or is it possible to add it in future release since that "stream.map(record
=> (keyFunction(record), record))" looks tedious.
I checked the python source code, KeyBy looks like a "shortcu
Hi,
I m seeing some strange error while inserting data from spark streaming to
hbase.
I can able to write the data from spark (without streaming) to hbase
successfully, but when i use the same code to write dstream I m seeing the
below error.
I tried setting the below parameters, still didnt hel
41 matches
Mail list logo