[jira] [Commented] (HIVE-19429) Investigate alternative technologies like docker containers to increase parallelism

Alan Gates (JIRA) Mon, 07 May 2018 07:39:53 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465976#comment-16465976
 ]


Alan Gates commented on HIVE-19429:
-----------------------------------

{quote}How much memory does your machine have?
{quote}
256G
{quote}I could not find a way to get the test results for the failed test.
{quote}
Yeah, I have not gotten to that part yet.  It should be easy enough to change 
the ResulstsAnalyzer to grab that information as well.  It may require 
revivifying the container to obtain the logs.  Though it will be better if we 
can teach the container to print the logs of failed tests so that "docker logs" 
will automatically get them in the first pass.
{quote}I think the existing batching logic is better than the one you have 
since we don't have to hardcode the directory names. The existing batching 
logic is much more customizable with regards to the batch sizes of individual 
CliDrivers.
{quote}
I don't like that I have the directory names etc. hard coded in the code.  At 
the very least this should be in configuration.  I have completely rewritten 
the MvnCommandFactory at least twice.  Every time I tried to get more general 
though it got insanely complicated.  Which leads me to the conclusion that 
rather than making this code much smarter, we should make the tests much 
simpler.  We should not have to read two config files to figure out which 
qfiles to run with which tests.  Ideally we could figure out a way to surface 
qfiles as individual tests rather than all buried in one test.  I have some 
thoughts on how to achieve this, but it's longer term.  Also, I haven't found 
the flexibility of different batch sizes worth the effort.  One size fits all 
isn't perfect but seems to be good enough.
{quote}I think it would be useful to run these containers in a cluster so that 
we can support multiple patches a time to speed up the testing.
{quote}
Definitely.  I happen to have a beefy machine handy, but that isn't the general 
case.  I designed it to support multiple container providers so it should be 
easy to write a ContainerClient that supports Yarn or Kubernetes instead of 
simple Docker.
{quote}Also, not sure if there is a way to run command on an existing docker 
container so that we can re-use deployed containers.
{quote}
I am not a Docker expert, but I think this is an anti-pattern.  Spinning up a 
new container is very fast and very low cost.  To reuse a container you either 
have to build it as a standing service that can keep taking requests (which is 
much more complex that running a simple test command), or turn the container 
into an image and then start a new container on that image (so you are starting 
a new container anyway).  Both of these are much more heavyweight than just 
starting a new container.  Occasionally we may be forced to restart the 
container to get information out of it (like in the case of getting logs from 
failed tests).

> Investigate alternative technologies like docker containers to increase 
> parallelism
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-19429
>                 URL: https://issues.apache.org/jira/browse/HIVE-19429
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19429) Investigate alternative technologies like docker containers to increase parallelism

Reply via email to