May 18 2009

A Look At robots.txt Files

A robots.txt file is a simple, static, file that you can add to your site in order to stop search engines from crawling the content of certain pages or directories. You can even prevent certain user agents from crawling certain areas of you site.

Lets take a real-world example and look at what you would do if you decided to set up a Feedburner feed in place of your normal RSS feed. I won’t go into why you would do this much, other than to say that you get some nice usage statistics. Once you have recoded your blog to issue the Feeburner feed you then need to stop search engines from indexing the old feed. You would then put a robots.txt file in place with the following content.

User-agent: *
Disallow: /feed

Read the rest of this entry »

Philip Norton
Lead Developer, Research and Development

Apr 24 2009

Adobe Acrobat exploit hijacks Google search results

I had the displeasure of dealing with a virus-infected Windows XP machine over the weekend. This virus was a browser hijacker, but it functioned for all web browsers (Firefox, IE and Chrome were used to test this theory) and essentially intercepted and re-wrote Google results to point to spammy sites which seemed to only exist to earn ad revenue. The interesting thing is the infection vector - this hijack managed to infiltrate the system through Acrobat Reader and a known security hole which has yet to be patched (as far as I am aware, anyway).

Read the rest of this entry »

Geoff Adams
Programmer, Research and Development

Apr 22 2009

Oracle buys Sun Microsystem for over $7 Billion

This is not hot new has it is known since 20th April. Oracle, on of the leaders in enterprise software buy of the leader in computing systems makes sense. Sun provide the best operating system, Solaris, for Oracle’s database product and many Oracle’s software rely on the Java language, developed by Sun.

“Oracle and Sun have been industry pioneers and close partners for more than 20 years,” said Sun Chairman Scott McNealy. “This combination is a natural evolution of our relationship and will be an industry-defining event.”

We will certainly see Oracle investing in Java for the benefits of the community and of course their clients.

The $7.4 Billion deal is expected to close this summer and will certainly be the biggest acquisition of 2009. Apparently IBM and Dell were also on the ball but missed the deal by undervaluing Sun’s shares.

Benoit Gilloz
Programmer, Research and Development

Apr 22 2009

Google Labs Release Similar Images Search

Google Labs is an infrequently updated part of Google that showcases new or interesting things that Google are working on. One thing that was of interest to me recently was the Similar Images search feature. This will allow you to search for a term using the normal Google image search, but adds the option to click on a link next an image and view more images that look like this one. For example, lets say you wanted to search for images of London, you can click on images of the London Eye and see different images of the same thing. Here is the official video from Google.

I thought I would have a play with this feature and see what I could do with it. One thing I always have trouble finding is an image of a mouse cursor. When I am writing user manuals I like to have the cursor in the image so that the user can see where they are mean to click. So after a image search for cursor I found the following results.

Google Image Search For Cursor

Google Image Search For Cursor

I then clicked on the similar link for the 6th image along on the second row and got the following page.

Google Image Search Similar Images

Google Image Search Similar Images

As you can see the feature if nearly there. From this page of images quite a few are indeed cursors, but the majority are not at all like the original image. For some reason Google seems to like to display images of cupid for this particular search.

I had more success when I tried searching for logos, which I presume if because they look quite alike. Searching for the logos of Wordpress, BBC, and Microsoft gave some good results.

So the big question is Will this become part of the main image search? I think the answer is probably yes. There are a quite a few products and features developed over the years that have become live, such as Gmail and iGoogle, so I think that we could very well see this feature appearing on normal Google image search results. You can already view images by face and drawing type, so this is probably just another extension of this. I just hope they refine the image recognition before putting it live.

Philip Norton
Lead Developer, Research and Development