thanks for the reply.
Asher, have you experienced problem when checkpoints are not enabled as
well? If we have large number of iterations (over 150) and checkpoints are
not enabled, the process just hangs (without no error) at around iteration
120-140 (on spark 2.0.0). I could not reproduce this o
t this SO
> thread:
> http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive
>
>
> On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor wrote:
>
>> Gave it another try - it seems that it picks up the variable and prints
>> out the correct value, but s
Gave it another try - it seems that it picks up the variable and prints out
the correct value, but still puts the metatore_db folder in the current
directory, regardless.
On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor wrote:
> Thank you for the reply.
>
> I have tried your experiment,
class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
> so does not have its own datastore table.
> res0: Array[org.apache.spark.sql.Row] = Array()
>
> scala> hc.getConf("hive.metastore.warehouse.dir")
> res1: String = /home/ykadiysk/Gi
/hostname.com:9083
>
> scala> hc.getConf("hive.metastore.warehouse.dir")
> res14: String = /user/hive/warehouse
>
>
>
> The first line tells you which metastore it's trying to connect to -- this
> should be the string specified under hive.metastore.uris pro
I have tried to put the hive-site.xml file in the conf/ directory with,
seems it is not picking up from there.
On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust
wrote:
> You can configure Spark SQLs hive interaction by placing a hive-site.xml
> file in the conf/ directory.
>
> On Thu, May 14, 2
Not sure what would slow it down as the repartition completes equally fast
on all nodes, implying that the data is available on all, then there are a
few computation steps none of them local on the master.
On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen wrote:
> What machines are HDFS data nodes --
between a
string or an array.
On Fri, Mar 27, 2015 at 3:20 PM, Tamas Jambor wrote:
> It is just a comma separated file, about 10 columns wide which we append
> with a unique id and a few additional values.
>
> On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu wrote:
>
>> jamborta :
>&g
It is just a comma separated file, about 10 columns wide which we append
with a unique id and a few additional values.
On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu wrote:
> jamborta :
> Please also describe the format of your csv files.
>
> Cheers
>
> On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail wrot
ried using spark-jobserver for that).
On Mon, Mar 2, 2015 at 3:07 PM, Sean Owen wrote:
> You can make a new StreamingContext on an existing SparkContext, I believe?
>
> On Mon, Mar 2, 2015 at 3:01 PM, Tamas Jambor wrote:
> > thanks for the reply.
> >
> > Actually,
create and manage
multiple streams - the same way that is possible with batch jobs.
On Mon, Mar 2, 2015 at 2:52 PM, Sean Owen wrote:
> I think everything there is to know about it is on JIRA; I don't think
> that's being worked on.
>
> On Mon, Mar 2, 2015 at 2:50 PM, Tamas Jam
I have seen there is a card (SPARK-2243) to enable that. Is that still
going ahead?
On Mon, Mar 2, 2015 at 2:46 PM, Sean Owen wrote:
> It is still not something you're supposed to do; in fact there is a
> setting (disabled by default) that throws an exception if you try to
> make multiple contex
Thanks for the reply, I am trying to setup a streaming as a service
approach, using the framework that is used for spark-jobserver. for that I
would need to handle asynchronous operations that are initiated from
outside of the stream. Do you think it is not possible?
On Fri Feb 13 2015 at 10:14:1
Thanks for the reply. Seems it is all set to zero in the latest code - I
was checking 1.2 last night.
On Fri Feb 06 2015 at 07:21:35 Sean Owen wrote:
> It looks like the initial intercept term is 1 only in the addIntercept
> && numOfLinearPredictor == 1 case. It does seem inconsistent; since
> i
thanks for the reply. I have tried to add SPARK_CLASSPATH, I got a warning
that it was deprecated (didn't solve the problem), also tried to run with
--driver-class-path, which did not work either. I am trying this locally.
On Mon Jan 26 2015 at 15:04:03 Akhil Das wrote:
> You can also trying a
interface that even gives you the possibility of passing a
> configuration.
>
> -kr, Gerard.
>
> On Wed, Jan 21, 2015 at 11:54 AM, Tamas Jambor wrote:
>
>> we were thinking along the same line, that is to fix the number of
>> streams and change the input and outpu
nterface to manage job
> lifecycle. You will still need to solve the dynamic configuration through
> some alternative channel.
>
> On Wed, Jan 21, 2015 at 11:30 AM, Tamas Jambor wrote:
>
>> thanks for the replies.
>>
>> is this something we can get around? Tried to hack into
thanks for the replies.
is this something we can get around? Tried to hack into the code without
much success.
On Wed, Jan 21, 2015 at 3:15 AM, Shao, Saisai wrote:
> Hi,
>
> I don't think current Spark Streaming support this feature, all the
> DStream lineage is fixed after the context is start
Thanks. The problem is that we'd like it to be picked up by hive.
On Tue Jan 13 2015 at 18:15:15 Davies Liu wrote:
> On Tue, Jan 13, 2015 at 10:04 AM, jamborta wrote:
> > Hi all,
> >
> > Is there a way to save dstream RDDs to a single file so that another
> process
> > can pick it up as a singl
Thanks. Will it work with sbt at some point?
On Thu, 13 Nov 2014 01:03 Xiangrui Meng wrote:
> You need to use maven to include python files. See
> https://github.com/apache/spark/pull/1223 . -Xiangrui
>
> On Wed, Nov 12, 2014 at 4:48 PM, jamborta wrote:
> > I have figured out that building the
Thanks for the reply, Sean.
I can see that splitting on all the categories would probably overfit
the tree, on the other hand, it might give more insight on the
subcategories (probably only would work if the data is uniformly
distributed between the categories).
I haven't really found any compari
Hi Xiangrui,
Thanks for the reply. is this still due to be released in 1.2
(SPARK-3530 is still open)?
Thanks,
On Wed, Nov 5, 2014 at 3:21 AM, Xiangrui Meng wrote:
> The proposed new set of APIs (SPARK-3573, SPARK-3530) will address
> this issue. We "carry over" extra columns with training and
That would work - I normally use hive queries through spark sql, I
have not seen something like that there.
On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain wrote:
> If you are using textFiles() to read data in, it also takes in a parameter
> the number of minimum partitions to create. Would that not
node that submits
> the application, (2) You can use the --driver-memory command line option if
> you are using Spark submit (bin/pyspark goes through this path, as you have
> discovered on your own).
>
> Does that make sense?
>
>
> 2014-10-01 10:17 GMT-07:00 Tamas Jambor :
&g
calling directly the respective
> backend code to launch it.
>
> (That being said, it would be nice to have a programmatic way of
> launching apps that handled all this - this has been brought up in a
> few different contexts, but I don't think there's an "official"
&g
thanks Marcelo.
What's the reason it is not possible in cluster mode, either?
On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin wrote:
> You can't set up the driver memory programatically in client mode. In
> that mode, the same JVM is running the driver, so you can't modify
> command line options
thanks for the reply.
As I mentioned above, all works in yarn-client mode, the problem
starts when I try to run it in yarn-cluster mode.
(seems that spark-shell does not work in yarn-cluster mode, so cannot
debug that way).
On Mon, Sep 29, 2014 at 7:30 AM, Akhil Das wrote:
> Can you try runnin
Thank you.
Where is the number of containers set?
On Thu, Sep 25, 2014 at 7:17 PM, Marcelo Vanzin wrote:
> On Thu, Sep 25, 2014 at 8:55 AM, jamborta wrote:
>> I am running spark with the default settings in yarn client mode. For some
>> reason yarn always allocates three containers to the appli
Hi Davies,
Thanks for the reply. I saw that you guys do that way in the code. Is
there no other way?
I have implemented all the predict functions in scala, so I prefer not
to reimplement the whole thing in python.
thanks,
On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu wrote:
> You should create
29 matches
Mail list logo