Re: HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Cheng Lian
Cache table works with partitioned table. I guess you’re experimenting with a default local metastore and the metastore_db directory doesn’t exist at the first place. In this case, all metastore tables/views don’t exist at first and will throw the error message you saw when the |PARTITIONS| me

Re: What is the best way to build my developing Spark for testing on EC2?

2014-10-02 Thread Evan Sparks
I recommend using the data generators provided with MLlib to generate synthetic data for your scalability tests - provided they're well suited for your algorithms. They let you control things like number of examples and dimensionality of your dataset, as well as number of partitions. As far as

What is the best way to build my developing Spark for testing on EC2?

2014-10-02 Thread Yu Ishikawa
Hi all, I am trying to contribute some machine learning algorithms to MLlib. I must evaluate their performance on a cluster, changing input data size, the number of CPU cores and any their parameters. I would like to build my develoipng Spark on EC2 automatically. Is there already a building

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread Nicholas Chammas
Thanks for the update, Nate. I'm looking forward to seeing how these projects turn out. David, Packer looks very, very interesting. I'm gonna look into it more next week. Nick On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico wrote: > Bit of progress on our end, bit of lagging as well. Our guy le

RE: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread Nate D'Amico
Bit of progress on our end, bit of lagging as well. Our guy leading effort got little bogged down on client project to update hive/sql testbed to latest spark/sparkSQL, also launching public service so we have been bit scattered recently. Will have some more updates probably after next week.

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread David Rowe
I think this is exactly what packer is for. See e.g. http://www.packer.io/intro/getting-started/build-image.html On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) has a bad package for httpd, whcih causes ganglia not to start. For some reason I can't get access to the raw AMI to

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-02 Thread Nicholas Chammas
Is there perhaps a way to define an AMI programmatically? Like, a collection of base AMI id + list of required stuff to be installed + list of required configuration changes. I’m guessing that’s what people use things like Puppet, Ansible, or maybe also AWS CloudFormation for, right? If we could d

HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Du Li
Hi, In Spark 1.1 HiveContext, I ran a create partitioned table command followed by a cache table command and got a java.sql.SQLSyntaxErrorException: Table/View 'PARTITIONS' does not exist. But cache table worked fine if the table is not a partitioned table. Can anybody confirm that cache of pa