SharePoint Search can be used to search multiple different locations and one of these is internet sites. Before deciding whether to crawl an single internet site there are a few things to consider:
- How much content you want to crawler on that website?
- Do you want to crawl the whole website?
- The larger the website your crawler the larger the database will grow in your SharePoint/SQL Farm.
- When will you crawl the site? Does crawling it a secpefic times course a lag on other reaching the site.
- Know how many pages deep you need to go into the site.
For this example I decide to crawl my blog site http://www.bfcnetworks.com. I have a few web analytic plugins installed which help me to understand which are my frequently hit pages and also for me to see the number of people who are visiting my site.
When starting a crawl there are 2 main areas I see the SharePoint 2010 crawler on the site. The first is in the Who’s Online. When turning on bots I see 4.01. This is the SharePoint crawler
I can also see which page the bot is crawling as well. Using StatPress it allows me to see the pages that were last hit.
Configure SharePoint 2010 Search to Crawl Internet Site
Log into your Central Admin and go to your Search Administration page.
Down the left hand navigation select Content Sources
Select New Content Source
Give a name to the content source and select Website as a Content Source Type.
Enter the name of the website you want to crawl.
From evaluating which website and the pages you want to crawl enter number of pages deep you which to crawl.
Select up any crawl schedules you want.
Select OK to save this new Content Source.
When you have started a full crawl you will see your results in your search centre