Re: example LDA code ClassCastException

2016-11-04 Thread Tamas Jambor
thanks for the reply. Asher, have you experienced problem when checkpoints are not enabled as well? If we have large number of iterations (over 150) and checkpoints are not enabled, the process just hangs (without no error) at around iteration 120-140 (on spark 2.0.0). I could not reproduce this o

Re: store hive metastore on persistent store

2015-05-16 Thread Tamas Jambor
t this SO > thread: > http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive > > > On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor wrote: > >> Gave it another try - it seems that it picks up the variable and prints >> out the correct value, but s

Re: store hive metastore on persistent store

2015-05-16 Thread Tamas Jambor
Gave it another try - it seems that it picks up the variable and prints out the correct value, but still puts the metatore_db folder in the current directory, regardless. On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor wrote: > Thank you for the reply. > > I have tried your experiment,

Re: store hive metastore on persistent store

2015-05-16 Thread Tamas Jambor
class > "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" > so does not have its own datastore table. > res0: Array[org.apache.spark.sql.Row] = Array() > > scala> hc.getConf("hive.metastore.warehouse.dir") > res1: String = /home/ykadiysk/Gi

Re: store hive metastore on persistent store

2015-05-15 Thread Tamas Jambor
/hostname.com:9083 > > scala> hc.getConf("hive.metastore.warehouse.dir") > res14: String = /user/hive/warehouse > > ​ > > The first line tells you which metastore it's trying to connect to -- this > should be the string specified under hive.metastore.uris pro

Re: store hive metastore on persistent store

2015-05-14 Thread Tamas Jambor
I have tried to put the hive-site.xml file in the conf/ directory with, seems it is not picking up from there. On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust wrote: > You can configure Spark SQLs hive interaction by placing a hive-site.xml > file in the conf/ directory. > > On Thu, May 14, 2

Re: writing to hdfs on master node much faster

2015-04-20 Thread Tamas Jambor
Not sure what would slow it down as the repartition completes equally fast on all nodes, implying that the data is available on all, then there are a few computation steps none of them local on the master. On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen wrote: > What machines are HDFS data nodes --

Re: Spark streaming

2015-03-27 Thread Tamas Jambor
between a string or an array. On Fri, Mar 27, 2015 at 3:20 PM, Tamas Jambor wrote: > It is just a comma separated file, about 10 columns wide which we append > with a unique id and a few additional values. > > On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu wrote: > >> jamborta : >&g

Re: Spark streaming

2015-03-27 Thread Tamas Jambor
It is just a comma separated file, about 10 columns wide which we append with a unique id and a few additional values. On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu wrote: > jamborta : > Please also describe the format of your csv files. > > Cheers > > On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail wrot

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Tamas Jambor
ried using spark-jobserver for that). On Mon, Mar 2, 2015 at 3:07 PM, Sean Owen wrote: > You can make a new StreamingContext on an existing SparkContext, I believe? > > On Mon, Mar 2, 2015 at 3:01 PM, Tamas Jambor wrote: > > thanks for the reply. > > > > Actually,

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Tamas Jambor
create and manage multiple streams - the same way that is possible with batch jobs. On Mon, Mar 2, 2015 at 2:52 PM, Sean Owen wrote: > I think everything there is to know about it is on JIRA; I don't think > that's being worked on. > > On Mon, Mar 2, 2015 at 2:50 PM, Tamas Jam

Re: multiple sparkcontexts and streamingcontexts

2015-03-02 Thread Tamas Jambor
I have seen there is a card (SPARK-2243) to enable that. Is that still going ahead? On Mon, Mar 2, 2015 at 2:46 PM, Sean Owen wrote: > It is still not something you're supposed to do; in fact there is a > setting (disabled by default) that throws an exception if you try to > make multiple contex

Re: Interact with streams in a non-blocking way

2015-02-13 Thread Tamas Jambor
Thanks for the reply, I am trying to setup a streaming as a service approach, using the framework that is used for spark-jobserver. for that I would need to handle asynchronous operations that are initiated from outside of the stream. Do you think it is not possible? On Fri Feb 13 2015 at 10:14:1

Re: one is the default value for intercepts in GeneralizedLinearAlgorithm

2015-02-06 Thread Tamas Jambor
Thanks for the reply. Seems it is all set to zero in the latest code - I was checking 1.2 last night. On Fri Feb 06 2015 at 07:21:35 Sean Owen wrote: > It looks like the initial intercept term is 1 only in the addIntercept > && numOfLinearPredictor == 1 case. It does seem inconsistent; since > i

Re: spark context not picking up default hadoop filesystem

2015-01-26 Thread Tamas Jambor
thanks for the reply. I have tried to add SPARK_CLASSPATH, I got a warning that it was deprecated (didn't solve the problem), also tried to run with --driver-class-path, which did not work either. I am trying this locally. On Mon Jan 26 2015 at 15:04:03 Akhil Das wrote: > You can also trying a

Re: dynamically change receiver for a spark stream

2015-01-21 Thread Tamas Jambor
interface that even gives you the possibility of passing a > configuration. > > -kr, Gerard. > > On Wed, Jan 21, 2015 at 11:54 AM, Tamas Jambor wrote: > >> we were thinking along the same line, that is to fix the number of >> streams and change the input and outpu

Re: dynamically change receiver for a spark stream

2015-01-21 Thread Tamas Jambor
nterface to manage job > lifecycle. You will still need to solve the dynamic configuration through > some alternative channel. > > On Wed, Jan 21, 2015 at 11:30 AM, Tamas Jambor wrote: > >> thanks for the replies. >> >> is this something we can get around? Tried to hack into

Re: dynamically change receiver for a spark stream

2015-01-21 Thread Tamas Jambor
thanks for the replies. is this something we can get around? Tried to hack into the code without much success. On Wed, Jan 21, 2015 at 3:15 AM, Shao, Saisai wrote: > Hi, > > I don't think current Spark Streaming support this feature, all the > DStream lineage is fixed after the context is start

Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Tamas Jambor
Thanks. The problem is that we'd like it to be picked up by hive. On Tue Jan 13 2015 at 18:15:15 Davies Liu wrote: > On Tue, Jan 13, 2015 at 10:04 AM, jamborta wrote: > > Hi all, > > > > Is there a way to save dstream RDDs to a single file so that another > process > > can pick it up as a singl

Re: No module named pyspark - latest built

2014-11-12 Thread Tamas Jambor
Thanks. Will it work with sbt at some point? On Thu, 13 Nov 2014 01:03 Xiangrui Meng wrote: > You need to use maven to include python files. See > https://github.com/apache/spark/pull/1223 . -Xiangrui > > On Wed, Nov 12, 2014 at 4:48 PM, jamborta wrote: > > I have figured out that building the

Re: why decision trees do binary split?

2014-11-06 Thread Tamas Jambor
Thanks for the reply, Sean. I can see that splitting on all the categories would probably overfit the tree, on the other hand, it might give more insight on the subcategories (probably only would work if the data is uniformly distributed between the categories). I haven't really found any compari

Re: pass unique ID to mllib algorithms pyspark

2014-11-05 Thread Tamas Jambor
Hi Xiangrui, Thanks for the reply. is this still due to be released in 1.2 (SPARK-3530 is still open)? Thanks, On Wed, Nov 5, 2014 at 3:21 AM, Xiangrui Meng wrote: > The proposed new set of APIs (SPARK-3573, SPARK-3530) will address > this issue. We "carry over" extra columns with training and

Re: partition size for initial read

2014-10-02 Thread Tamas Jambor
That would work - I normally use hive queries through spark sql, I have not seen something like that there. On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain wrote: > If you are using textFiles() to read data in, it also takes in a parameter > the number of minimum partitions to create. Would that not

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Tamas Jambor
node that submits > the application, (2) You can use the --driver-memory command line option if > you are using Spark submit (bin/pyspark goes through this path, as you have > discovered on your own). > > Does that make sense? > > > 2014-10-01 10:17 GMT-07:00 Tamas Jambor : &g

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Tamas Jambor
calling directly the respective > backend code to launch it. > > (That being said, it would be nice to have a programmatic way of > launching apps that handled all this - this has been brought up in a > few different contexts, but I don't think there's an "official" &g

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Tamas Jambor
thanks Marcelo. What's the reason it is not possible in cluster mode, either? On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin wrote: > You can't set up the driver memory programatically in client mode. In > that mode, the same JVM is running the driver, so you can't modify > command line options

Re: yarn does not accept job in cluster mode

2014-09-29 Thread Tamas Jambor
thanks for the reply. As I mentioned above, all works in yarn-client mode, the problem starts when I try to run it in yarn-cluster mode. (seems that spark-shell does not work in yarn-cluster mode, so cannot debug that way). On Mon, Sep 29, 2014 at 7:30 AM, Akhil Das wrote: > Can you try runnin

Re: Yarn number of containers

2014-09-25 Thread Tamas Jambor
Thank you. Where is the number of containers set? On Thu, Sep 25, 2014 at 7:17 PM, Marcelo Vanzin wrote: > On Thu, Sep 25, 2014 at 8:55 AM, jamborta wrote: >> I am running spark with the default settings in yarn client mode. For some >> reason yarn always allocates three containers to the appli

Re: access javaobject in rdd map

2014-09-23 Thread Tamas Jambor
Hi Davies, Thanks for the reply. I saw that you guys do that way in the code. Is there no other way? I have implemented all the predict functions in scala, so I prefer not to reimplement the whole thing in python. thanks, On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu wrote: > You should create