BTW, After I revert SPARK-784, I can see all the jars under
lib_managed/jars
On Tue, Nov 17, 2015 at 2:46 PM, Jeff Zhang wrote:
> Hi Josh,
>
> I notice the comments in https://github.com/apache/spark/pull/9575 said
> that Datanucleus related jars will still be copied to lib_managed/jars.
> But
Hi Josh,
I notice the comments in https://github.com/apache/spark/pull/9575 said
that Datanucleus related jars will still be copied to lib_managed/jars. But
I don't see any jars under lib_managed/jars. The weird thing is that I see
the jars on another machine, but could not see jars on my laptop e
Kafka now build-in supports managing metadata itself besides ZK, it is easy
to use and change from current ZK implementation. I think here the problem
is do we need to manage offset in Spark Streaming level or leave this
question to user.
If you want to manage offset in user level, letting Spark t
Hi Renyi,
This is the intended behavior of the streaming HdfsWordCount example. It
makes use of a 'textFileStream' which will monitor a hdfs directory for any
newly created files and push them into a dstream. It is meant to be run
indefinitely, unless interrupted by ctrl-c, for example.
-bryan
Hi all,
I'm running the mesos cluster dispatcher, however when I submit jobs with
things like jvm args, classpath order and UI port aren't added to the
commandline executed by the mesos scheduler. In fact it only cares about
the class, jar and num cores/mem.
https://github.com/jayv/spark/blob/mes
The only dependancy on Zookeeper I see is here:
https://github.com/apache/spark/blob/1c5475f1401d2233f4c61f213d1e2c2ee9673067/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala#L244-L247
If that's the only line that depends on Zookeeper, we could probably tr
+1
On Tue, Nov 17, 2015 at 7:43 AM, Joseph Bradley
wrote:
> That sounds useful; would you mind submitting a JIRA (and a PR if you're
> willing)?
> Thanks,
> Joseph
>
> On Fri, Oct 23, 2015 at 12:43 PM, Robert Dodier
> wrote:
>
>> Hi,
>>
>> MLUtils.loadLibSVMFile verifies that indices are 1-base
I did, and it passes all of our test case, so I'm wondering what did I miss. I
know there is the memory leak spill JIRA SPARK-11293, but not sure if that will
go in 1.4.2 or 1.4.3, etc.
From: Reynold Xin
Sent: Friday, November 13, 2015 1:31 PM
To: Andrew Lee
C
Hi Sergio,
Apart from apologies about limited review bandwidth (from me too!), I
wanted to add: It would be interesting to hear what feedback you've gotten
from users of your package. Perhaps you could collect feedback by (a)
emailing the user list and (b) adding a note in the Spark Packages poin
One comment about
"""
1) I agree the sorting method you suggested is a very efficient way to
handle the unordered categorical variables in binary classification
and regression. I propose we have a Spark ML Transformer to do the
sorting and encoding, bringing the benefits to many tree based
methods.
That sounds useful; would you mind submitting a JIRA (and a PR if you're
willing)?
Thanks,
Joseph
On Fri, Oct 23, 2015 at 12:43 PM, Robert Dodier
wrote:
> Hi,
>
> MLUtils.loadLibSVMFile verifies that indices are 1-based and
> increasing, and otherwise triggers an error. I'd like to suggest that
Hi Fernando,
the "persistence" of a DStream is defined depending of the StorageLevel.
Window is not related to persistence: it's the processing of multiple
DStream in one, a kind of "gather of DStreams". The transformation is
applied on a "slide window". For instance, you define a window of 3
Hi all,
I was wondering if someone could give me a brief explanation or point me
in the right direction in the code for where DStream persistence is donde.
I'm looking at DStream.java but all it does is setting the StorageLevel,
and neither WindowedDStream or ReducedWindowedDStream seem to chang
Done, thanks.
On Mon, Nov 9, 2015 at 7:23 PM, Cheng, Hao wrote:
> Yes, we definitely need to think how to handle this case, probably even
> more common than both sorted/partitioned tables case, can you jump to the
> jira and leave comment there?
>
>
>
> *From:* Alex Nastetsky [mailto:alex.nastet
There are already private methods in the code for interacting with Kafka's
offset management api.
There's a jira for making those methods public, but TD has been reluctant
to merge it
https://issues.apache.org/jira/browse/SPARK-10963
I think adding any ZK specific behavior to spark is a bad idea
I really like the Streaming receiverless API for Kafka streaming jobs, but
I'm finding the manual offset management adds a fair bit of complexity. I'm
sure that others feel the same way, so I'm proposing that we add the
ability to have consumer offsets managed via an easy-to-use API. This would
be
See this thread:
http://search-hadoop.com/m/q3RTtLKc2ctNPcq&subj=Re+Spark+1+4+2+release+and+votes+conversation+
> On Nov 15, 2015, at 10:53 PM, Niranda Perera wrote:
>
> Hi,
>
> I am wondering when spark 1.4.2 will be released?
>
> is it in the voting stage at the moment?
>
> rgds
>
> --
This is the exception I got
15/11/16 16:50:48 WARN metastore.HiveMetaStore: Retrying creating default
database after error: Class
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
javax.jdo.JDOFatalUserException: Class
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not
It's about the datanucleus related jars which is needed by spark sql.
Without these jars, I could not call data frame related api ( I make
HiveContext enabled)
On Mon, Nov 16, 2015 at 4:10 PM, Josh Rosen
wrote:
> As of https://github.com/apache/spark/pull/9575, Spark's build will no
> longer p
FiloDB is also closely reated. https://github.com/tuplejump/FiloDB
On Mon, Nov 16, 2015 at 12:24 AM, Nick Pentreath
wrote:
> Cloudera's Kudu also looks interesting here (getkudu.io) - Hadoop
> input/output format support:
> https://github.com/cloudera/kudu/blob/master/java/kudu-mapreduce/src/ma
Cloudera's Kudu also looks interesting here (getkudu.io) - Hadoop
input/output format support:
https://github.com/cloudera/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableInputFormat.java
On Mon, Nov 16, 2015 at 7:52 AM, Reynold Xin wrote:
> This (updates) is som
As of https://github.com/apache/spark/pull/9575, Spark's build will no
longer place every dependency JAR into lib_managed. Can you say more about
how this affected spark-shell for you (maybe share a stacktrace)?
On Mon, Nov 16, 2015 at 12:03 AM, Jeff Zhang wrote:
>
> Sometimes, the jars under li
Sometimes, the jars under lib_managed is missing. And after I rebuild the
spark, the jars under lib_managed is still not downloaded. This would cause
the spark-shell fail due to jars missing. Anyone has hit this weird issue ?
--
Best Regards
Jeff Zhang
23 matches
Mail list logo