Hi George Thanks for the details. It looks like I have a long way to go. For big data benchmark, I would like to use that test cases, test data and test methodology to test different big data technologies. BTW, I am agree with you that no one system will necessarily be optimal for all cases for all workloads. I hope I can find a good one for our enterprise application. I will let you know if I can move forward this. Good Night.
Best regards Hawin On Wed, Jul 15, 2015 at 9:30 AM, George Porter <gmpor...@cs.ucsd.edu> wrote: > Hi Hawin, > > We used varying numbers of the i2.8xlarge servers, depending on the sort > record category. http://sortbenchmark.org/ is really your best source > for what we did--all the details (should) be on our write-ups. Note that > we pro-rated the cost, meaning that if we ran for 15 minutes, we took the > hourly rate and divided by 4. > > In terms of sponsorship, we used a combination of credits donated by > Amazon, as well as funding form the National Science Foundation. You can > submit a grant proposal to Amazon and ask them for credits if you're an > academic or researcher. Not sure if being part of an open-source project > counts, but you might as well try. > > In terms of the sort record, that webpage I provided above has all the > details on the challenge. Not sure about Big Data benchmark--that term is > pretty vague. Often when people say big data, they mean different things. > Our system is designed for lots of bytes, but not really lots of compute > over those bytes. Others pick different design points. I think you'll > find that the needs of different users varies quite a bit, and no one > system will necessarily be optimal for all cases for all workloads. > > Good luck on your attempts. > -George > > ---- > George Porter > Assistant Professor, Dept. of Computer Science and Engineering > Associate Director, UCSD Center for Networked Systems > UC San Diego, La Jolla CA > http://www.cs.ucsd.edu/~gmporter/ > > > > On Wed, Jul 15, 2015 at 1:44 AM, Hawin Jiang <hawin.ji...@gmail.com> > wrote: > >> Hi George and Mike >> >> Thanks for your information. Did you use 186 i2.8xlarge servers for >> testing? >> Total one hour cost = 186 * 6.82 = $1,268.52. >> Do you know any person or company can sponsor this? >> >> For our test approach, I have checked an industry standard from big data >> bench(http://prof.ict.ac.cn/BigDataBench/industry-standard-benchmarks/) >> Maybe we can test TeraSort to see the performance is better than your >> record or not. >> >> Please let me know if you have any comments. >> Thanks for the support. >> >> >> >> >> Best regards >> Hawin >> >> >> >> On Tue, Jul 14, 2015 at 9:42 AM, Mike Conley <mcon...@cs.ucsd.edu> wrote: >> >>> George is correct. We used i2.8xlarge with placement groups on Amazon >>> EC2. We ran Amazon Linux, which if I recall correctly is based on Red Hat, >>> but optimized for EC2. OS was essentially unmodified with some packages >>> installed for our dependencies. >>> >>> Thanks, >>> Mike >>> >>> On Tue, Jul 14, 2015 at 9:15 AM, George Porter <gmpor...@cs.ucsd.edu> >>> wrote: >>> >>>> Hello Hawin, >>>> >>>> Thanks for reaching out. We wrote a paper on our efforts, which we'll >>>> be posting to our website in a couple of weeks. >>>> >>>> However in summary, we used a cluster of i2.8xlarge instance types from >>>> Amazon, and we made use of the placement group feature to ensure that we'd >>>> get good bandwidth between them. Mike can correct me if I'm wrong, but I >>>> believe we used the stock AWS version of Linux (Ubuntu maybe?) >>>> >>>> So our environment was pretty stock--we didn't get any special support >>>> or features from AWS. >>>> >>>> Best of luck with your profiling and benchmarking. Do let us know how >>>> you perform. Flink looks like a pretty interesting project, and so let us >>>> know if we can help y'all out in some way. >>>> >>>> Thanks, George >>>> >>>> >>>> On Sun, Jul 12, 2015 at 11:12 PM, Hawin Jiang <hawin.ji...@gmail.com> >>>> wrote: >>>> >>>>> Hi Michael and George >>>>> >>>>> >>>>> >>>>> First of all, congratulation you guys have won the sort game again. >>>>> We are coming from Flink community. >>>>> >>>>> I am not sure if it is possible to get your test environment to test >>>>> our Flink for free. we saw that Apache spark did a good job as well. >>>>> >>>>> We want to challenge your records. But we don’t have that much servers >>>>> for testing. >>>>> >>>>> Please let me know if you can help us or not. >>>>> >>>>> Thank you very much. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Best regards >>>>> >>>>> Hawin >>>>> >>>> >>>> >>> >> >