date:20150603

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread DB Tsai

Is your HDP implementation based on distributed gibbs sampling? Thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao wrote: > Hi Lorenz, > > > > I’m trying to build a prototype of HDP for a

RE: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread Yang, Yuhao

Hi Lorenz, I’m trying to build a prototype of HDP for a customer based on the current LDA implementations. An initial version will probably be ready within the next one or two weeks. I’ll share it and hopefully we can join forces. One concern is that I’m not sure how widely it will be used

Ivy support in Spark vs. sbt

2015-06-03 Thread Marcelo Vanzin

Hey all, I've been bit by something really weird lately and I'm starting to think it's related to the ivy support we have in Spark, and running unit tests that use that code. The first thing that happens is that after running unit tests, sometimes my sbt builds start failing with error saying som

[ANNOUNCE] YARN support in Spark EC2

2015-06-03 Thread Shivaram Venkataraman

Hi all We recently merged support for launching YARN clusters using Spark EC2 scripts as a part of https://issues.apache.org/jira/browse/SPARK-3674. To use this you can pass in hadoop-major-version as "yarn" to the spark-ec2 script and this will setup Hadoop 2.4 HDFS, YARN and Spark built for YARN

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread Joseph Bradley

Hi Lorenz, I'm not aware of people working on hierarchical topic models for MLlib, but that would be cool to see. Hopefully other devs know more! Glad that the current LDA is helpful! Joseph On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer wrote: > Hi All > > I'm working on a project in which

Cleaning up workers' directories automatically

2015-06-03 Thread atalay

Hi everyone, everytime our data comes and new updates occur in our cluster, an undesirable file is being created in workers' directories.In order to cleanup automatically I changed the variable value Spark (Standalone) Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Shivaram Venkataraman

I created https://issues.apache.org/jira/browse/SPARK-8085 for this. On Wed, Jun 3, 2015 at 12:12 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Hmm - the schema=myschema doesn't seem to work in SparkR from my simple > local test. I'm filing a JIRA for this now > > On Wed, Jun 3

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Shivaram Venkataraman

Hmm - the schema=myschema doesn't seem to work in SparkR from my simple local test. I'm filing a JIRA for this now On Wed, Jun 3, 2015 at 11:04 AM, Eskilson,Aleksander < alek.eskil...@cerner.com> wrote: > Neat, thanks for the info Hossein. My use case was just to reset the > schema for a CSV dat

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Eskilson,Aleksander

Neat, thanks for the info Hossein. My use case was just to reset the schema for a CSV dataset, but if either a. I can specify it at load, or b. it will be inferred in the future, I’ll likely not need to cast columns, much less reset the whole schema. I’ll still file a JIRA for the capability, bu

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Hossein Falaki

Yes, spark-csv does not infer types yet, but it is planned to be implemented soon. To work around the current limitations (of spark-csv and SparkR), you can specify the schema in read.df() to get your desired types from spark-csv. For example: myschema <- structType(structField(“id", "integer"

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Reynold Xin

I think Hossein does want to implement schema inference for CSV -- then it'd be easy. Another way you can do this is to use R dataframe/table to read the CSV files in, and then convert it into a Spark DataFrames. Not going to be scalable, but could work. On Wed, Jun 3, 2015 at 10:49 AM, Eskilson,

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Eskilson,Aleksander

Hi Shivaram, As far as databricks’ spark-csv API shows, it seems there’s currently only support for explicit definition of column types. In JSON we have nice typed fields, but in CSVs, all bets are off. In the SQL version of the API, it appears you specify the column types when you create the t

Re: SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Shivaram Venkataraman

cc Hossein who knows more about the spark-csv options You are right that the default CSV reader options end up creating all columns as string. I know that the JSON reader infers the schema [1] but I don't know if the CSV reader has any options to do that. Regarding the SparkR syntax to cast colum

SparkR DataFrame Column Casts esp. from CSV Files

2015-06-03 Thread Eskilson,Aleksander

It appears that casting columns remains a bit of a trick in Spark’s DataFrames. This is an issue because tools like spark-csv will set column types to String by default and will not attempt to infer types. Although spark-csv supports specifying types for columns in its options, it’s not clear h

MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread Lorenz Fischer

Hi All I'm working on a project in which I use the current LDA implementation that has been contributed by Databricks' Joseph Bradley et al. for the recent 1.3.0 release (thanks guys!). While this is great, my project requires several levels of topics, as I would like to offer users to drill down

Stop Master and Slaves without SSH

2015-06-03 Thread Devl Devel

Hey All, start-slaves.sh and stop-slaves.sh make use of SSH to connect to remote clusters. Are there alternative methods to do this without SSH? For example using: ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT is fine but there is no way to kill the Worker without usin

Re: GraphX: New graph operator

2015-06-03 Thread Reynold Xin

I'd think id is the unique identifier by default. On Wed, Jun 3, 2015 at 12:13 AM, Tarek Auel wrote: > Hi, > > The graph is already there (GraphX) and has the two RDDs you described. My > question tries to get an idea, if the community thinks that it's a benefit > and would be a plus or not. If

Re: GraphX: New graph operator

2015-06-03 Thread Tarek Auel

Hi, The graph is already there (GraphX) and has the two RDDs you described. My question tries to get an idea, if the community thinks that it's a benefit and would be a plus or not. If yes, I would like to contribute it to GraphX (either as part of GraphOpts or as external library). An interestin

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

RE: MLlib: Anybody working on hierarchical topic models like HLDA?

Ivy support in Spark vs. sbt

[ANNOUNCE] YARN support in Spark EC2

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Cleaning up workers' directories automatically

Re: SparkR DataFrame Column Casts esp. from CSV Files

Re: SparkR DataFrame Column Casts esp. from CSV Files

Re: SparkR DataFrame Column Casts esp. from CSV Files

Re: SparkR DataFrame Column Casts esp. from CSV Files

Re: SparkR DataFrame Column Casts esp. from CSV Files

Re: SparkR DataFrame Column Casts esp. from CSV Files

Re: SparkR DataFrame Column Casts esp. from CSV Files

SparkR DataFrame Column Casts esp. from CSV Files

MLlib: Anybody working on hierarchical topic models like HLDA?

Stop Master and Slaves without SSH

Re: GraphX: New graph operator

Re: GraphX: New graph operator

18 matches

Site Navigation

Mail list logo

Footer information