How to Create a Web Service in a Matter of Minutes

Offering your content or logic as a service on the web is a great idea. For starters it allows you to build numerous front-ends for your own information without having to access the databases all the time (and thus making scaling your system much easier).

The even more practical upshot is that you allow people on the web to play with your information and build things you never even dreamed of doing. A lot of companies understand that this “crowd-sourced innovation” is a freebie that is too good to miss which is why there are so many great APIs around.
Providing an API to the world is a totally different story though. You need to know how to scale your servers, you need to be there for answering questions by implementers, and you need to maintain a good documentation to allow people to use your content. You also need to think about a good caching strategy to keep your servers from blowing up and you need to find a way to limit access to your system to avoid people abusing it. Or do you?

Enter YQL

Yahoo offers a system for people to access their APIs called the Yahoo Query Language, or YQL. YQL is a SQL-style language that turns information on the web into virtual databases that can be queried by end users. So if you want to, for example, search the web for the term “elephant,” all you need to do is to use the following statement:
  1. select * from search.web where query="elephant"  
You send this statement to a data endpoint, and you get it back as either XML, JSON, or JSON-P. You can request more results, and you can filter them by defining what you want to get back:
  1. http://query.yahooapis.com/v1/public/yql  
  2. ?q={yql query}  
  3. &diagnostics={true|false}  
  4. &format={json|xml}  
  5. &callback={function name}  

Mix and Match

All of Yahoo’s APIs are available through this interface, and you can mix and match services with sub-selections. For example, you could run a keyword analysis tool over the abstract of a web search to find relevant keyterms. Using the unique() functions, you can also easily remove false positives.
  1. select * from search.termextract where context in (  
  2.   select abstract from search.web(50) where query="elephant")  
  3. unique(field="Result")  
See the results of this more complex query here.
Keywords extracted from the abstract of search results

The Console

The easiest way to play with YQL as a consumer is to use the console at http://developer.yahoo.com/yql/console/. There you can click on different tables to see a demo query how to use it and if you click the desc link you find out which options are available to you.

YQL Limits

The use of YQL has a few limits which are described in the documentation. In essence, you can access the open data endpoint 1,000 times in an hour, per IP. If you authenticate an application with oAuth, you get 10,000 hits an hour. Each application is allowed 100,000 hits a day.
This, and the caching of results that YQL does automatically means that the data does only get requested when it changed. This means that YQL is sort of a firewall for requests to the data people offer with it.
Be careful when using jQuery’s “$.getJSON,” and an anonymous function as its callback. This can bust YQL’s caching abilities, and hinder performance.

Building Web Services with Open Tables

The really cool thing for you as a provider is that YQL is open for other data providers.
If you want to offer an API to the world (or just have one for yourself internally) you can easily do that by writing an “open table” which is an XML schema pointing to a web service.
People do this a lot, which means that, if you click the “Show community tables” link in the YQL console, you will find that there are now 812 instead of 118 tables to play with (as of today – tomorrow there will probably be more).
To get your service into YQL and offer it to the world all you need to do is to point YQL to it. Let’s look at a simple example:

Real-World Application: Craigslist as an API

The free classified ad web site Craigslist has no public API – which is a shame, really. However, when you do a search on the site you will find that the search results have an RSS output – which is at least pointing towards API functionality. For example, when I search for “schwinn mountain bike” in San Francisco, the URL of the search would be:
  1. http://sfbay.craigslist.co.uk/search/sss?format=rss&query=schwinn+mountain+bike  
This can be changed into a URL with variables, with the variables being the location, the type of product you are looking for (which is the section of the site) and the query you searched for (in this case I wrapped the parameters in curly braces):
  1. http://{location}.craigslist.co.uk/search/{type}?format=rss&query={query}  
Once you found a pattern like this you can start writing your open table:
  1. xml version="1.0" encoding="UTF-8"?>  
  2. <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">  
  3.   <meta>  
  4.     <author>Yahoo! Inc.</author>  
  5.     <documentationURL>http://craigslist.org/</documentationURL>  
  6.     <sampleQuery>select * from {table} where  
  7.     location="sfbay" and type="sss" and  
  8.     query="schwinn mountain bike"</sampleQuery>  
  9.     <description>Searches Craigslist.org</description>  
  10.   </meta>  
  11.   <bindings>  
  12.     <select itemPath="" produces="XML">  
  13.       <urls>  
  14.         <url>http://{location}.craigslist.org/search/{type}?format=rss</url>  
  15.       </urls>  
  16.       <inputs>  
  17.         <key id="location" type="xs:string" paramType="path" required="true" />  
  18.         <key id="type" type="xs:string" paramType="path" required="true" />  
  19.         <key id="query" type="xs:string" paramType="query" required="true" />  
  20.       </inputs>  
  21.     </select>  
  22.   </bindings>  
  23. </table>  
For a full description of what all that means, you can check the YQL documentation on open tables but here is a quick walkthrough:
  1. You start with the XML prologue and a table element pointing to the schema for YQL open tables. This allows YQL to validate your table.
  2. You add a meta element with information about your table: the author, the URL of your documentation and a sample query. The sample query is the most important here, as this is what will show up in the query box of the YQL console when people click on your table name. It is the first step to using your API — so make it worth while. Show the parameters you offer and how to use them. The {table} part will be replaced with the name of the table.
  3. The bindings element shows what the table is connected to and what keys are expected in a query.
  4. You define the path and the type of the output in the select element – values for the type are XML or JSON and the path allows you only to return a certain section of the data returned from the URL you access.
  5. In the urls section, you define the URL endpoints of your service. In our case, this is the parameterised URL from earlier. YQL replaces the elements in curly braces with the information provided by the YQL user.
  6. In the inputs section, you define all the possible keys the end users can or should provide. Each key has an id, a paramType which is either path, if the parameter is a part of the URL path, or query, if it is to be added to the URL as a parameter. You define which keys are mandatory by setting the mandatory attribute to true.
And that is it! By putting together this XML document, you did the first of three steps to get your web services to be part of the YQL infrastructure. The next step is to tell YQL where your web service definition is. Simply upload the file to a server, for example http://isithackday.com/craigslist.search.xml. You then point YQL to the service by applying the use command:
  1. use "http://isithackday.com/craigslist.search.xml" as cl;  
  2. select * from cl where location"sfbay" and type="sss" and query="playstation"  
You can try this out and you’ll see that you now find playstations for sale in the San Francisco Bay Area. Neat, isn’t it?

Logic as a Service

Sometimes you have no web service at all, and all you want to do is offer a certain logic to the world. I found myself doing this very thing the other day. What I wanted to know is the distance between two places on Earth. For this, I needed to find the latitude and longitude of the places and then do very clever calculations. As I am a lazy person, I built on work that other people have done for me. In order to find the latitude and longitude of a certain place on Earth you can use the Yahoo Geo APIs. In YQL, you can do this with:
  1. select * from geo.places(1) where text="paris"  
Try this out yourself.
In order to find a function that calculates the distance between two places on Earth reliably, I spent a few minutes on Google and found Chris Veness’ implementation of the “Vincenty Inverse Solution of Geodesics on the Ellipsoid”.
YQL offers an executable block inside open tables which contains server-side JavaScript. Instead of simply returning the data from the service, you can use this to convert information before returning it. You can also do REST calls to other services and to YQL itself in these JavaScript blocks. And this is what I did:
  1. xml version="1.0" encoding="UTF-8"?>  
  2. <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">  
  3.   <meta>  
  4.     <sampleQuery>  
  5.       select * from {table} where place1="london" and place2="paris"  
  6.     </sampleQuery>  
  7.     <author>Christian Heilmann</author>  
  8.     <documentationURL>  
  9. http://isithackday.com/hacks/geo/distance/  
  10.     </documentationURL>  
  11.     <description>  
  12.       Gives you the distance of two places on earth in miles or kilometers  
  13.     </description>  
  14.   </meta>  
  15.   <bindings>  
  16.     <select itemPath="" produces="XML">  
  17.       <inputs>  
  18.         <key id='place1' type='xs:string' paramType='variable'  
  19.              required="true" />  
  20.         <key id='place2' type='xs:string' paramType='variable'  
  21.              required="true" />  
  22.       </inputs>  
  23.       <execute><![CDATA[ 
  24.         default xml namespace = "http://where.yahooapis.com/v1/schema.rng"; 
  25.         var res = y.query("select * from geo.places(1) where text='" + 
  26.                           place1 + "'").results; 
  27.         var res2 = y.query("select * from geo.places(1) where text='" + 
  28.                            place2 + "'").results; 
  29.         var lat1 = res.place.centroid.latitude; 
  30.         var lon1 = res.place.centroid.longitude; 
  31.         var lat2 = res2.place.centroid.latitude; 
  32.         var lon2 = res2.place.centroid.longitude; 
  33.         var d = distVincenty(lat1,lon1,lat2,lon2); 
  34.         function distVincenty(lat1, lon1, lat2, lon2) { 
  35.           /* ... vincenty function... */ 
  36.         var d = d / 1000; 
  37.         var miles = Math.round(d/1.609344); 
  38.         var kilometers = Math.round(d); 
  39.         response.object =  
  40.                             {miles} 
  41.                             {kilometers} 
  42.                             {res.place} 
  43.                             {res2.place} 
  44.                           
  •       ]]></execute>  
  •     </select>  
  •   </bindings>  
  • </table>  
    1. The meta element is the same as any other open table.
    2. In the bindings we don’t have a URL to point to so we can omit that one. However, we now add an execute element which ensures that the keys defined will be sent to the JavaScript defined in this block.
    3. As the Geo API of Yahoo returns namespaced XML, we need to tell the JavaScript which namespace that is.
    4. I execute two YQL queries from the script using the y.query() method using the place1 and place2 parameters to get the locations of the two places. The .results after the method call makes sure I get the results. I store them in res and res2 respectively.
    5. I then get the latitude and longitude for each of the results and call the distVincenty() method.
    6. I divide the result by 1000 to get the kilometers and multiply the result with the right number to get the miles.
    7. I end the script part by defining a response.object which is what YQL will return. As this is server-side JavaScript with full E4X support all I need to write is the XML I want to return with the JavaScript variables I want to render out in curly braces.
    Using this service and adding a bit of interface to it, I can now easily show the distance between Batman and Robin.
    Showing the distance between two places on earth
    Using server-side JavaScript you can not only convert data but also easily offer a service that only consists of calculations – much like Google Calculator does.

    Turning an Editable Data Set into a Web Service

    What you really want to do in most cases though is to allow people to edit the data that drives the web service in an easy fashion. Normally, we’d build a CMS, we’d train people on it, and spend a lot of time to get the data from the CMS onto the web to access it through YQL. It can be done easier though.
    A few months ago, I released a web site called winterolympicsmedals.com which shows you all the information about the Winter Olympics over the years.
    The data that drives the web site was released for free by The Guardian in the UK on their Data Blog as an Excel spreadsheet. In order to turn this into an editable data set, all I had to do was save a copy to my own Google Docs repository. You can reach that data here. Google Docs allows sharing of Spreadsheets on the web. By using “CSV” as the output format, I get a URL to access in YQL:
    Sharing a spreadsheet in Google docs
    And using YQL you can use CSV as a data source:
    1. select * from csv where  
    2. url="http://spreadsheets.google.com/pub? 
    3. key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc &hl=en&output=csv"  
    See the result of that in your own browser.
    As you can see, the CSV table automatically adds rows and columns to the XML output. In order to make that a more useful and filter-able web service, you can provide a columns list to rename the resulting XML elements:
    1. select * from csv where url="http://spreadsheets.google.com/pub? 
    2. key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc&hl=en&output=csv" and  
    3. columns="year,city,sport,discipline,country,event,gender,type"  
    See the renamed columns in your browser.
    This allows you to filter the information, which is exactly what I did to build winterolympicsmedals.com. For example to get all the gold medals from 1924 you’d do the following:
    1. select * from csv where url="http://spreadsheets.google.com/pub? 
    2. key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc&hl=en&output=csv" and  
    3. columns="year,city,sport,discipline,country,event,gender,type"  
    4. and year="1924" and type="Gold"  
    See the gold medals of 1924 in your browser.
    So you can use the free storage of Google and the free web service infrastructure to convert free data into a web service. All you need to do is create a nice interface for it.

    Adding your Service to YQL’s Community Tables

    Once you’ve defined your open table, you can use it by hosting it on your own server, or you can go full in by adding it to the YQL table repository. To do this, all it needs is for you to add it to the YQL tables repository at GitHub which can be found at http://github.com/yql/yql-tables/. Extensive help on how to use Git and GitHub can be found in their help section.
    If you send a request to the YQL team to pull from your repository, they’ll test your table, and if all is fine with it, they’ll move it over to http://datatables.org/ which is the resource for the communities table in the YQL console.
    This does not only make the life of other developers more interesting, but is also very good promotion for you. Instead of hoping to find developers to play with your data, you bring the data to where developers already look for it.

    Advanced YQL Topics

    This introduction can only scrape the surface of what you can do with YQL. If you check the documentation, you’ll find that, in addition to these “read” open tables, you can also set up some services that can be written to, and YQL also offers cloud storage of your information. Check the extensive YQL documentation for more.
    How to Create a Web Service in a Matter of Minutes How to Create a Web Service in a Matter of Minutes Reviewed by JohnBlogger on 5:01 PM Rating: 5

    No comments: