Rashmi Rubdi wrote: > > So the solution for Bryce would be to leave the session on on each JSP > page, and omit the cookies attribute of <Context which defaults it to > true. > This should solve the problem of jsessionid for bots. > From my observation search bots support cookies otherwise I would have the > problem of jsessionid appended to URLs too. >
I'm just not getting it. Can someone take a look at this site, and maybe give some insight? http://www.citycarshare.org/howitworks.do Or at 216.93.188.140 you can see a test intance which has the following ROOT/META-INF/context.xml <?xml version='1.0' encoding='UTF-8'?> <Context path='/' cookies="false"> </Context> I can share with you lots of log lines showing the JSESSIONID, including crawls by Google, Alexa and Exalead. A quick scan of the Google index shows cached pages with JSESSIONID. 2224 of the 9273 log lines from today have JSESSIONID. I have thousands on thousands of crawls of the same content, on the same day, with different JSESSIONID's. Here are some examples: 69.106.42.228 - - [01/Dec/2006:16:02:58 -0800] "GET /images/events/CCS_5_Icon.png;jsessionid=6DC390F0ADC7569009CB60C98378919D HTTP/1.1" 200 4960 "http://www.citycarshare.org/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0" 193.47.80.51 - - [30/Nov/2006:04:37:56 -0800] "GET /press.do;jsessionid=E49722F6235A31A3627A6C62753A7CDB HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:04:59:56 -0800] "GET /press.do;jsessionid=1407BA083FB2123469A4E544C3F26DFC HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:06:16:01 -0800] "GET /press.do;jsessionid=5FAFDDABFF42C82F3C766377F5AC9F44 HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:06:31:36 -0800] "GET /press.do;jsessionid=EAFB1F3DB5B7D47DFF4212A66911754F HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:07:00:45 -0800] "GET /press.do;jsessionid=ADF4E609E38901897648ABD6C7BF4E57 HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:07:20:54 -0800] "GET /press.do;jsessionid=5049AD9757D8C7BAA599C2837EBFB3BE HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:07:37:42 -0800] "GET /press.do;jsessionid=F53FC49BCAD98F4181F05DAC7D7A65C4 HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:07:49:13 -0800] "GET /press.do;jsessionid=15FCF8DCE01CBD47DAB1A8D668EF9F38 HTTP/1.1" 200 22020 "-" "Exabot/3.0" 193.47.80.51 - - [30/Nov/2006:07:59:28 -0800] "GET /press.do;jsessionid=E717438CB2746895BFF9C16DE6A72F28 HTTP/1.1" 200 22020 "-" "Exabot/3.0" I am 1000% certain that not all bots browse with cookies, at least not all the time. How can I stop these bots from crawling me so often? It is over 25% of my bandwidth just to the duplicate crawls, never mind the regular bot traffic. -- View this message in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7667574 Sent from the Tomcat - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]