Re: Where can I store data files in a tomcat war

Christopher Schultz Thu, 03 Jul 2014 08:04:27 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Paul,


On 7/2/14, 4:28 PM, Paul Taylor wrote:
> On 02/07/2014 16:34, Christopher Schultz wrote:
>> 
>> The solution is that the web application, packaged in a WAR
>> file, needs to unpack the Lucene indexes onto the disk when it
>> starts up. You can do this with a ServletContextListener.
> 
> So I do within init() method of my servlet, but EB doesnt wait for
> the init() method to finish before declaring the application ready,
> do you think it would wait for code using a ServletContextListener
> or fail in the same way it does for init() ?

Which init() are you talking about? If EB marks the cluster node as
available before the webapp has completed deploying, that sounds like
a big problem. If you are using servlet.init() without any
load-on-startup for that servlet, then I would expect the bad behavior
you describe. I think you'll have better luck with
ServletContextListener, because the servlet spec guarantees those to
be complete before the webapp can receive requests.

If EB allows requests to arrive before the deployment is complete,
basically nothing in EB would work properly.

>> That would work, too, but you'll have to "pay" for download time
>> for each member of the cluster. If you pack the indexes in the
>> WAR file, they are already available when the webapp
>> initializes.
> 
> See my later posts, it doesn't work because of problem with EB not 
> respecting finish of init(), and I cant pack the indexes into WAR 
> because breaks Amazons max war size of 1/2 GB

Again, which init() are you talking about?

>> Neither tar nor gzip take very much of anything: they are both 
>> block-oriented. What procedure were you using to decompress the 
>> tarballs? Decompressing the entire tarball and then tearing it
>> apart is a mistake: you should chain the processes together so
>> you read from the tarball and write individual, uncompressed
>> files to the disk.
> 
> With the java solution I was using
> 
> |import  org.rauschig.jarchivelib.Archiver; import
> org.rauschig.jarchivelib.ArchiverFactory; ......... File
> indexDirFile=  new  File(indexDirParent).getAbsoluteFile(); 
> indexDirFile.mkdirs(); Archiver  archiver=
> ArchiverFactory.createArchiver(largeFile); 
> archiver.extract(largeFile,  indexDirFile);
> 
> which is a library around Apache Compress, and that did create a 
> temporary tar file

You want to manage the streams yourself: hook a gzip reader up to a
tar reader and then read files from the tar reader:

GzipInputStream gzin = new GzipInputStream(tgzFile);
TarInputStream tin = new TarInputStream(gzin);

InputStream fin;

while(fin = tin.getNextFile())
   copy(fin, new File(localFilename));

That's oversimplified, but essentially what you want to do. There is
no need to completely expand the gzip archive before reading files out
of it.

> But maybe if using linux commands directly I wont hit the problem.
> I think using .ebextensions is now my best chance of getting
> something working.

Okay. I have no idea what those are so I can't comment.

If you have to download the indexes from elsewhere, you might want to
download a compressed file since expanding the compressed file will
probably be faster than downloading an uncompressed version of it.

>> There is another option: stick the master index on an EBS store
>> and mount the EBS store on the target machine. IIRC, EBS volumes
>> can't be shared (which is a big pain IMO) so you can't mount that
>> disk on all of your Lucene servers... you might have to mount the
>> EBS store, copy the indexes, and then unmount the store. You'd
>> only have to do this once each time you wanted to launch an
>> additional instance or update the index.
> 
> But the whole point of Autoscaled EB deployments, is Amazon 
> automatically starts additional servers if load gets heavy and 
> terminates them if underused. I dont have to consciously make
> those decisions or be around, very useful if (as I suspect) Im
> going to have busy and quiet times during each 24 hour period.
> Maybe I could have 4 EBS stores loaded (default max no of servers
> is 4) ready and then when server starts have some code in my init()
> method  to mount the next available(not mounted) EBS volume and use
> it. But I think this does been paying for four EBS stores all the
> time , and I dont know how to code for this because usually AFAIK
> the volumes have to be assigned to an EC2 instance before the
> instance can mount them.

Correct, you'd have to associate the EBS volume with the instance,
then mount on the instance. If I were doing it, I'd do it with a
single EBS instance instead of 4 because you only need access one-time
to copy the files.

>> Or, you could look into Solr which I believe understands
>> clustering. Then, you load the index onto the cluster and do
>> whatever you want with it.
> 
> I dont think Solr clustering would with EB autoscaling instead I
> would have to work directly with EC2 and forgo all the advantages
> of EB autoscaling, also I already have my code written and working
> I have no desire (or time)  to convert to Solr (or ElastcicSearch
> for that matter)

Fair enough. Let us know how this turns out.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTtXDQAAoJEBzwKT+lPKRYyyIQAL5ax9TubKnKYVP5I1j8LVYC
/xeS3KzdAxXN9IupmqlJeDgFtr1J9R2K0kLTSCDuoeKN8KWErJeT6C9uraE8vG2O
tlmll8hpFW8F/fEcbauesduRTFGK0uo6HSYZcvE8otDbLDuj7NxuZUzB26JLHTOA
ZhbUNGjiqGnfcpEL52LBKTKI+Xa0IXJCCHFgp1a1nPLSL/xaqUWBW36Mjf5Q/SL8
gM6I/5cZkb/8yRhaPVxMFY3O77Nt5Zk1dK3Y1gNXKTc1vjIjxBhU/r6Cd8GLaTof
nNuKqy9M4fX/9AbPJ8XfScDw2UxosHOsEZ2Vl9BtwYtuiaIh6mnm5VFRJdqENk/F
LNYFruh+Pn5Bz/6M9yWPPdblTdcB6MMB9WG34rn/jMp/guss6x9j6pdZwGPCCcpm
Snb/US+ozzGnNDHQqmL93Hqs1a7q5De+MQt16w3cFfAK51aPcEdXn+ZOhVWhFLYx
kKQ4F1lp46ISRLCgjohNg1YMK5FHdyFUTM350enSk3TwDeaw3JRFGVFK+xLNAW1X
pxHmPt4PLGcvIQMQ14xAVTYC2itv6hASEDCx0hbbu9i87GcGbT7WJD9LXJFYG6V+
qaJ04mHzXYbaA9FDFiMjzDHQPEVeNT10PGORN0vULiN2hGyNob26zM0JG9sGc2fU
F/MX5h/hLdYp5q+nlCVw
=pEVu
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Where can I store data files in a tomcat war

Reply via email to