Wednesday 17 March 2010

Students' home pages are not searchable

I am not sure about other departments, but every single CS student at Seneca College has its own web page http://matrix.senecac.on.ca/~username. None of those sites can be found through the search engines because crawling's blocked by the line 7 "Disallow: /" in robots.txt file. All respectable search engines carefully follow these instructions.

Currently most students have automatically generated page which is not very informative. Some students spent time developing it only to find out that content will never be found. This fact is very disapointing.

I think that Seneca will encourage more students to develop their personal pages if it allows search engines to crawl it.

2 comments:

Tiago Moreira said...

This is well-known and common to a lot of post-secondary institutions. Basically, those accounts are for academic use only. They are in-fact searchable, but they must have several incoming links to make the indexes.

I'm sure the reasoning is that servers like Matrix and Zenit could easily be bogged down by too much traffic or if students decide to host large files (whatever you could fit in your 20 MB haha) it'll be expensive for the school in the cost of bandwidth and server resources.

Remember, we just recently got over 20 MB of e-mail storage space for our student accounts. How sad is that?

Dmitry Rozhkov said...

I don't think pages will be indexed even if they get incoming links. Bots are just blocked. Also, I don't know if bots are going to create huge performance hit, since vast majority of students don't update their pages, so there is nothing new to update. And disk storage... well, it doesn't cost a lot these days anyways...