CaptainSearch's profileAdventures in Search Blog Tools Help

Adventures in Search

From the frontlines of Search in Southeast Asia
October 12

The difference between [-] dash and [_] underscore in asset naming

Over the weekend, I receive a few questions about naming convention that will help improve SEO. Although correct naming doesn’t have too much of an impact in SEO, it does help in certain situation when the crawler needs to infer the content of your page.

Some tips and tricks:

- If you are using space, delimit it by using a – instead of a _; to a crawler,
  this_is_a_page means “thisisapage” while this-is-a-page means “this is a page”
  so whenever a crawler wants to match a keyword, you will want to have zune-music-player instead of zune_music_player

It is also advisable to give meaningful names to your pages or pictures, 12312inwf23.jpg doesn’t help tell what the picture is to a crawler but zune.jpg does help a lot.

That same advice applies to all other files that you want the crawler to crawl to (e.g. .mp3, .jpg,. html, etc)

October 10

Introducing your site structure to a search engine

In order for the crawler to know which page to crawl, the crawler will try to infer the site structure either using sitemap or simply crawling the links on the main page. You can generate a sitemap XML based on specification located at http://www.sitemaps.org/ from http://www.xml-sitemaps.com/

A short introduction of sitemap: (abstracted from http://www.sitemaps.org/)

What are Sitemaps?

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

http://www.xml-sitemaps.com/ is a site with a free tool to generate the sitemap.xml.

After generating the sitemap.xml document, you will need to inform the crawler of this change: (abstracted from http://www.sitemaps.org/)

Informing search engine crawlers

Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location. You can do this by:

The search engines can then retrieve your Sitemap and make the URLs available to their crawlers.

Submitting your Sitemap via the search engine's submission interface

To submit your Sitemap directly to a search engine, which will enable you to receive status information and any processing errors, refer to each search engine's documentation.

Specifying the Sitemap location in your robots.txt file

You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line:

Sitemap: <sitemap_location>

The <sitemap_location> should be the complete URL to the Sitemap, such as: http://www.example.com/sitemap.xml

This directive is independent of the user-agent line, so it doesn't matter where you place it in your file. If you have a Sitemap index file, you can include the location of just that file. You don't need to list each individual Sitemap listed in the index file.

You can specify more than one Sitemap file per robots.txt file.

Sitemap: <sitemap1_location>
Sitemap: <sitemap2_location>
Submitting your Sitemap via an HTTP request

To submit your Sitemap using an HTTP request (replace <searchengine_URL> with the URL provided by the search engine), issue your request to the following URL:

<searchengine_URL>/ping?sitemap=sitemap_url

For example, if your Sitemap is located at http://www.example.com/sitemap.gz, your URL will become:

<searchengine_URL>/ping?sitemap=http://www.example.com/sitemap.gz

URL encode everything after the /ping?sitemap=:

<searchengine_URL>/ping?sitemap=http%3A%2F%2Fwww.yoursite.com%2Fsitemap.gz

You can issue the HTTP request using wget, curl, or another mechanism of your choosing. A successful request will return an HTTP 200 response code; if you receive a different response, you should resubmit your request. The HTTP 200 response code only indicates that the search engine has received your Sitemap, not that the Sitemap itself or the URLs contained in it were valid. An easy way to do this is to set up an automated job to generate and submit Sitemaps on a regular basis.
Note: If you are providing a Sitemap index file, you only need to issue one HTTP request that includes the location of the Sitemap index file; you do not need to issue individual requests for each Sitemap listed in the index.

Hello World from the SEO Specialist

Hi everyone! I’m taking care of SEO related activities in the MSN family for SEA. I’m very passionate in technology – coming from a developer background and live search is my latest love. My primary job scope is to produce SEO best practices and to help web companies make their website friendlier to search crawlers.

October 07

Where the “Live” Magic Happens

Have you seen Photosynth from Microsoft Live Labs?  Super cool.  Honey made a quick “synth” of the Online Services Group – Southeast Asia office area.

Check out other cool photosynths (and even create YOUR OWN) here… http://photosynth.net/default.aspx

1team 

- Chewy

August 27

Don't Know Where You Are? Maybe our "WhereAmI" Project Can Help...

 

Drum roll please....

After many months of hard work, we would to introduce the people of Southeast Asia to our "WhereAmI" project (http://locationpinpoint.com).  The goal of the project is to help facilitate location based services through FREE location-ing information.  Think of it as a GPS device that works off of WiFi and/or cell tower data and doesn't need to be outdoors.

I won't go into too much technical details but essentially:

  • Input = WiFi access points you hear around you AND/OR the cell towers you hear around you
  • Output = an estimated longitude and latitude of where we think you are

Doesn't sound like much but this is currently the key "missing link" to build much cooler (and useful) services to service scenarios such as:

  • Are my friends around me?  If so, I'll give them a shout.
  • I lost my phone but my phone will tell me where I left it.
  • I want to find all the top rated restaurants around me but I'm not exactly sure where I am (or I don't know my exact address).
  • I want to go home, which is the next bus that can take me there and where is the bus stop.

There are services like this here and there but they are all commercially executed (fees).  We intended to provide this service to the people of Southeast Asia (SEA) FOR FREE! 

Built by the people of Southeast Asia... for use by the people of Southeast Asia.

No, this service isn't a Microsoft offering but rather... something fun that we built on the weekends BY MANY OF YOU in the community (we have a few uni and poly students, a Singapore scholar and a few people at Microsoft).

The project has a few pieces:

  • WhereAmI Service (the service you would use to query your location)
  • Infrastructure (the gooey-goodness in the middle)
  • Survey (the tools we use to survey the information which gets sent back to you when you use the service)

Here's a quick 2 min video on how it works and how it all comes together.

http://video.msn.com/video.aspx?vid=85238949-6807-47a9-b66b-c78427d41eb6

 

We're in the middle of an update from our test servers to something more robust to handle the public's load.  Will put an announcement when the service is up (in about a week).

Coverage is currently Singapore but we're keen to grow the service.  If you're in SEA and interested in offering this service to your region for free, we'll provide all the tools, equipment and infrastructure... all we need you to do is to help with the survey.  Feel free to drop me a note. :)

More exciting stuff to share shortly.  I'll also introduce the people who's hard-work is making this all possible.

Chewy (chewyc at microsoft dot com)
Head of Search - SEA