Re: Web spiders - disabling jsessionid

2006-12-04 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bryce, brycenesbitt wrote: > > Caldarale, Charles R wrote: >> Try turning off cookies in your browser. >> > > Sorry for the lack of clarity. I can't force jessionid to show up even with > cookies off in the browser. My guess is that there is a pag

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
- Original Message From: brycenesbitt [EMAIL PROTECTED] >>A quick google search will show this happens to many other people -- even if >>your webapps are magically immune. http://www.citycarshare.org/ is >>definitely affected. It's not magically immune. It's just built differently fro

Re: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
eturn url; } public String encodeURL(String url) { return url; } }; -- View this message in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7670111 Sent from the Tomcat - User mailing list archive at Nabble.com. ---

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
One thing about search engine bots though is that repairs to jsessionid (removing jsession id) from URLs won't be instantaneous, because they cache all URLs, and on subsequent visits they visit each cached URL. This means that even if you solve the problem of jsessionid now, you will still see

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
--- Original Message From: Len Popp <[EMAIL PROTECTED]> To: Tomcat Users List Sent: Sunday, December 3, 2006 8:10:00 PM Subject: Re: Web spiders - disabling jsessionid On 12/3/06, Rashmi Rubdi <[EMAIL PROTECTED]> wrote: > No , I'm using Tomcat 5.5. And I've omitted the

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Len Popp
On 12/3/06, Rashmi Rubdi <[EMAIL PROTECTED]> wrote: No , I'm using Tomcat 5.5. And I've omitted the cookies attribute of Context in my Tomcat settings. And Googlebot or any other bot is accessing the URLs just fine (that is without the jsessionid ). When I look in the server access logs, jses

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
- Original Message >From: brycenesbitt [EMAIL PROTECTED] >>Rashmi Rubdi wrote: >> >>So the solution for Bryce would be to leave the session on on each JSP >> page, and omit the cookies attribute of > true. >>This should solve the problem of jsessionid for bots. >> From my observation se

Re: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
800] "GET /press.do;jsessionid=E717438CB2746895BFF9C16DE6A72F28 HTTP/1.1" 200 22020 "-" "Exabot/3.0" I am 1000% certain that not all bots browse with cookies, at least not all the time. How can I stop these bots from crawling me so often? It is over 25% of my bandwidth j

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
Original Message From: Eric Haszlakiewicz [EMAIL PROTECTED] >> Perhaps that is the /quickest/ solution, but I would argue that the best >> solution is not to create a session if you don't actually need one. >heh. yeah, not creating the session is definitely NOT the quickest way. :) >e

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Eric Haszlakiewicz
On Fri, Dec 01, 2006 at 04:50:02PM -0500, Christopher Schultz wrote: > Mikolaj Rydzewski wrote: > > Caldarale, Charles R wrote: > >> That contradicts what Len said about his site: > >> > >> "On my site (as on many others) you can browse the site without a > >> session, but if you want to log in (to

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
Or simply leave out the cookies attribute in your Context, this defaults to cookies = "true" anyway. >>No option seems to match the need: >>true -- uses URL-rewriting if the browser does not support cookies. this is >>exactly the problem, as spiders don't use cookies. No. Googlebot and other

RE: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
Caldarale, Charles R wrote: > >> From: brycenesbitt [mailto:[EMAIL PROTECTED] >> Subject: Re: Web spiders - disabling jsessionid >> Creating semicolon-based URL strings is the default in >> Tomcat/Struts. > > I don't know about Struts, but that's

RE: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
ifferent cached and stale session ID's. -- View this message in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7661062 Sent from the Tomcat - User mailing list archive at Nabble.com. -

Re: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
sage in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7661033 Sent from the Tomcat - User mailing list archive at Nabble.com. - To start a new topic, e-mail: users@tomcat.apache.org To unsub

RE: Web spiders - disabling jsessionid

2006-12-03 Thread Caldarale, Charles R
> From: brycenesbitt [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > I can't force a ";" based jsessionid to show in Firefox. Try turning off cookies in your browser. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTH

RE: Web spiders - disabling jsessionid

2006-12-03 Thread Caldarale, Charles R
> From: brycenesbitt [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > Creating semicolon-based URL strings is the default in > Tomcat/Struts. I don't know about Struts, but that's not true for Tomcat. Look at the cookies at

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
- Original Message From: brycenesbitt <[EMAIL PROTECTED]> >>The problem in many cases is the author does not care about sessions at all! >>Creating semicolon-based URL strings is the default in Tomcat/Struts. We >>get session ID's not because we want a session, but because we can't figur

Re: Web spiders - disabling jsessionid

2006-12-02 Thread brycenesbitt
in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7660959 Sent from the Tomcat - User mailing list archive at Nabble.com. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMA

Re: Web spiders - disabling jsessionid

2006-12-02 Thread brycenesbitt
based URL strings is the default in Tomcat/Struts. We get session ID's not because we want a session, but because we can't figure out how to turn them off! -- View this message in context: http://www.nabble.com/Web-spiders---disabling-jsessionid-tf2737558.html#a7660951 Sent from the

Re: Web spiders - disabling jsessionid

2006-12-02 Thread Bryce Nesbitt
>Hi, >As you may know url rewriting feature is not a nice thing when spiders >come to index your site - >http://gabrito.com/post/javas-seo-blunder-jsessionid. I'm having such trouble with JSESSIONID and search engines Google, Accoona, Alexa and Exalead. My approach was to contact each firm, and a

Re: Web spiders - disabling jsessionid

2006-12-02 Thread brycenesbitt
's up? 71.146.168.171 - - [02/Dec/2006:23:29:20 -0800] "GET /images/events/CCS_5_Icon.p ng;jsessionid=D5912D8983A86FF2BCF3381DB454D54A HTTP/1.1" 200 4960 "http://www.ci tycarshare.org/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/418 .9 (KHTML, like Gecko

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Rashmi Rubdi
- Original Message From: "Caldarale, Charles R" [EMAIL PROTECTED] >> From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] >> Subject: Re: Web spiders - disabling jsessionid >> >> I think then, setting cookies to "true", or simply leaving >> o

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > I think then, setting cookies to "true", or simply leaving > out the cookies attribute should solve the original poster's > problem with disabling JSESSIONID

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Rashmi Rubdi
- Original Message From: "Caldarale, Charles R" [EMAIL PROTECTED] >>From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] >> >> There's no jsessionid appended at the end of URLs that the >> bot requests. >Depends on what the value of the cookies attribute for the is; >if false, or the app ch

RE: [OT] Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Leon Rosenberg [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > It's completely OT, but once a customer of mine has placed a > direct-login link to the public accessible test-system for the newest > project on a crawled site, so that

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > There's no jsessionid appended at the end of URLs that the > bot requests. Depends on what the value of the cookies attribute for the is; if false, or the app chooses to

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Rashmi Rubdi
> Caldarale, Charles R wrote: > Filter with wrapper ServletResponse is IMO the best solution. > You can apply it to almost every application without touching the code. >>Perhaps that is the /quickest/ solution, but I would argue that the best >>solution is not to create a session if you don't actu

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
On 12/1/06, Caldarale, Charles R <[EMAIL PROTECTED]> wrote: > From: Len Popp [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > On my site (as on many others) you can browse the site without a > session, but if you want to log in (to

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mikolaj, Mikolaj Rydzewski wrote: > Caldarale, Charles R wrote: >> That contradicts what Len said about his site: >> >> "On my site (as on many others) you can browse the site without a >> session, but if you want to log in (to add content or to use >

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Len Popp
On 12/1/06, Caldarale, Charles R <[EMAIL PROTECTED]> wrote: > From: Chris Adams [mailto:[EMAIL PROTECTED] > Subject: RE: Web spiders - disabling jsessionid > > That's not true. A session id is assigned the moment you hit > the site. That contradicts what Len said ab

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Mikolaj Rydzewski
Caldarale, Charles R wrote: That's not true. A session id is assigned the moment you hit the site. That contradicts what Len said about his site: "On my site (as on many others) you can browse the site without a session, but if you want to log in (to add content or to use personalized se

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Chris Adams [mailto:[EMAIL PROTECTED] > Subject: RE: Web spiders - disabling jsessionid > > That's not true. A session id is assigned the moment you hit > the site. That contradicts what Len said about his site: "On my site (as on many others) you can

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Chris Adams
still manage the "anonymous" user's session. - Chris -Original Message- From: Caldarale, Charles R [mailto:[EMAIL PROTECTED] Sent: Friday, December 01, 2006 7:14 PM To: Tomcat Users List Subject: RE: Web spiders - disabling jsessionid > From: Len Popp [mailto:[EMAIL

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Len Popp [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > On my site (as on many others) you can browse the site without a > session, but if you want to log in (to add content or to use > personalized settings) you need a session. O.k.

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Len Popp
On 12/1/06, Christopher Schultz <[EMAIL PROTECTED]> wrote: Mikolaj, Back to the original question... Mikolaj Rydzewski wrote: > As you may know url rewriting feature is not a nice thing when spiders > come to index your site - > http://gabrito.com/post/javas-seo-blunder-jsessionid. So, the pro

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mikolaj, Back to the original question... Mikolaj Rydzewski wrote: > As you may know url rewriting feature is not a nice thing when spiders > come to index your site - > http://gabrito.com/post/javas-seo-blunder-jsessionid. So, the problem is that y

Re: [OT] Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chris, Chris Adams wrote: > Your empirical data really isn't useful, because it's based upon an > assumption that could very easily be false. Good point. Still, nobody has given any source for this information, so I'm inclined to consider it a myth f

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Chris Adams
mall number (e.g. 1 or 2) of those hits would come from this "google-incognito" agent. - Chris -Original Message- From: Christopher Schultz [mailto:[EMAIL PROTECTED] Sent: Friday, December 01, 2006 3:13 PM To: Tomcat Users List Subject: Re: Web spiders - disabling jsessionid ---

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
Hello Christopher, When I check my access logs I could imagine Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) beeing a google.bot. Of course I don't know it for sure, cause I'm don't do any seo cloaking here, and don't care. But one could go to seo boards, pick the posted ip-adresses for cloa

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Leon, Leon Rosenberg wrote: > you believe everything you've been told ?:-) Well, I've been told by you, and I don't believe you. ;) > google has 3 (at least 3 known) user agents : google, mozzila with > google-bot in the agent string (the one you se

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Mikolaj Rydzewski
Leon Rosenberg wrote: google uses this 3rd agent to check your site from another ip adress, whether you do some ugly seo stuff, like cloacking etc. Seems possible. so please don't do it, if you rely on being found. I think that just removing ;jsessionid=XXX for the first one won't make much ha

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
you believe everything you've been told ?:-) google has 3 (at least 3 known) user agents : google, mozzila with google-bot in the agent string (the one you sent) and another one, which is just Mozilla/5.0. google uses this 3rd agent to check your site from another ip adress, whether you do some

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Tim Funk
Wrong. Google is very clear about not hiding user agent - as well as a the other major bots. Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html Just just for Googlebot in the user-agent header. -Tim Leon Rosenberg wrote: On 12/1/06, Tim Funk <[EMAIL PROTECTED]> wrote: T

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
On 12/1/06, Tim Funk <[EMAIL PROTECTED]> wrote: The easiest is the filter and custom HttpServletResponse which overrides encodeURL() to do nothing. It could be made one step smarter by checking if the User agent is a search engine bot to selectively execute or not. How do you want to achieve

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Tim Funk
The easiest is the filter and custom HttpServletResponse which overrides encodeURL() to do nothing. It could be made one step smarter by checking if the User agent is a search engine bot to selectively execute or not. -Tim Mikolaj Rydzewski wrote: Hi, As you may know url rewriting feature

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Andrew Stepanenko
Hello, we use filter in web.xml as you said: StripSessionIdFilter. Works fine. Not that much overhead. Regards Andrew Stepanenko, Ternopil, Ukraine http://unf.tane.edu.ua On 12/1/06, Mikolaj Rydzewski <[EMAIL PROTECTED]> wrote: Hi, As you may know url rewriting feature is not a nice thing whe

Web spiders - disabling jsessionid

2006-12-01 Thread Mikolaj Rydzewski
Hi, As you may know url rewriting feature is not a nice thing when spiders come to index your site - http://gabrito.com/post/javas-seo-blunder-jsessionid. There are a few solutions I'm thinking of: * configuration at Tomcat / web.xml level to disable/enable url rewriting (unfortunate