Re: Sort Benchmark infrastructure

Hawin Jiang Thu, 16 Jul 2015 01:30:53 -0700

Hi  George

Thanks for the details.  It looks like I have a long way to go.
For big data benchmark, I would like to use that test cases, test data and
test methodology to test different big data technologies.
BTW, I am agree with you that no one system will necessarily be optimal for
all cases for all workloads.
I hope I can find a good one for our enterprise application.  I will let
you know if I can move forward this.
Good Night.




Best regards
Hawin

On Wed, Jul 15, 2015 at 9:30 AM, George Porter <[email protected]> wrote:

> Hi Hawin,
>
> We used varying numbers of the i2.8xlarge servers, depending on the sort
> record category.  http://sortbenchmark.org/ is really your best source
> for what we did--all the details (should) be on our write-ups.  Note that
> we pro-rated the cost, meaning that if we ran for 15 minutes, we took the
> hourly rate and divided by 4.
>
> In terms of sponsorship, we used a combination of credits donated by
> Amazon, as well as funding form the National Science Foundation.  You can
> submit a grant proposal to Amazon and ask them for credits if you're an
> academic or researcher.  Not sure if being part of an open-source project
> counts, but you might as well try.
>
> In terms of the sort record, that webpage I provided above has all the
> details on the challenge.  Not sure about Big Data benchmark--that term is
> pretty vague.  Often when people say big data, they mean different things.
> Our system is designed for lots of bytes, but not really lots of compute
> over those bytes.  Others pick different design points.  I think you'll
> find that the needs of different users varies quite a bit, and no one
> system will necessarily be optimal for all cases for all workloads.
>
> Good luck on your attempts.
> -George
>
> ----
> George Porter
> Assistant Professor, Dept. of Computer Science and Engineering
> Associate Director, UCSD Center for Networked Systems
> UC San Diego, La Jolla CA
> http://www.cs.ucsd.edu/~gmporter/
>
>
>
> On Wed, Jul 15, 2015 at 1:44 AM, Hawin Jiang <[email protected]>
> wrote:
>
>> Hi  George and Mike
>>
>> Thanks for your information.  Did you use 186 i2.8xlarge servers for
>> testing?
>> Total one hour cost = 186 * 6.82 = $1,268.52.
>> Do you know any person or company can sponsor this?
>>
>> For our test approach, I have checked an industry standard from big data
>> bench(http://prof.ict.ac.cn/BigDataBench/industry-standard-benchmarks/)
>> Maybe we can test TeraSort to see the performance is better than your
>> record or not.
>>
>> Please let me know if you have any comments.
>> Thanks for the support.
>>
>>
>>
>>
>> Best regards
>> Hawin
>>
>>
>>
>> On Tue, Jul 14, 2015 at 9:42 AM, Mike Conley <[email protected]> wrote:
>>
>>> George is correct.  We used i2.8xlarge with placement groups on Amazon
>>> EC2.  We ran Amazon Linux, which if I recall correctly is based on Red Hat,
>>> but optimized for EC2.  OS was essentially unmodified with some packages
>>> installed for our dependencies.
>>>
>>> Thanks,
>>> Mike
>>>
>>> On Tue, Jul 14, 2015 at 9:15 AM, George Porter <[email protected]>
>>> wrote:
>>>
>>>> Hello Hawin,
>>>>
>>>> Thanks for reaching out.  We wrote a paper on our efforts, which we'll
>>>> be posting to our website in a couple of weeks.
>>>>
>>>> However in summary, we used a cluster of i2.8xlarge instance types from
>>>> Amazon, and we made use of the placement group feature to ensure that we'd
>>>> get good bandwidth between them.  Mike can correct me if I'm wrong, but I
>>>> believe we used the stock AWS version of Linux (Ubuntu maybe?)
>>>>
>>>> So our environment was pretty stock--we didn't get any special support
>>>> or features from AWS.
>>>>
>>>> Best of luck with your profiling and benchmarking.  Do let us know how
>>>> you perform.  Flink looks like a pretty interesting project, and so let us
>>>> know if we can help y'all out in some way.
>>>>
>>>> Thanks, George
>>>>
>>>>
>>>> On Sun, Jul 12, 2015 at 11:12 PM, Hawin Jiang <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Michael and George
>>>>>
>>>>>
>>>>>
>>>>> First of all, congratulation you guys have won the sort game again.
>>>>> We are coming from Flink community.
>>>>>
>>>>> I am not sure if it is possible to get your test environment to test
>>>>> our Flink for free.  we saw that Apache spark did a good job as well.
>>>>>
>>>>> We want to challenge your records. But we don’t have that much servers
>>>>> for testing.
>>>>>
>>>>> Please let me know if you can help us or not.
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> Hawin
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Sort Benchmark infrastructure

Reply via email to