Advertising 2.0

Ok, very short post because this workshop is a) packed, standing room only so no opportunity to take notes and b) a little outside my area of interested being very much focused (lots of suits in the room!)

The panel predicts the emergence of a new network not of content producers but of content distributors.
Shifting dynamics of web advertising - away from measuring success in terms of direct response to advertising and toward brand building which has been more prevalent in traditional media, but which has the potential to be much more accountable & measurable as user behaviour becomes more visible across the web and through time. To me, this needs to dovetail with the ideas explored in the previous session around making identity data available while preserving anonymity.

The Next Internet Infrastructure

Having registered and had breakfast, I’m in my first workshop of the day.

Jonathan Hare, Resilient
An open services archictecture. needs to be freely licenced, hostable, extendable and be capable of supporting a emergent ecosystem.
The web contains plenty of open content, but islands of authentication. Authentication needs to be first order in next generation architecture that is becomes possible to extend the way we do things on the web now to things that we currently can’t because of the lack of inbuilt trust mechanisms. i.e. you can’t apply web principles to healthcare, finance etc yet.
“openness is its own reward”, opportunity comes through interoperation. This doesn’t necessarily mean open source “google search is open, but not open source”

Jeff Barr, Amazon
Marc Canter points out freely licenced services have to be paid for - resources like bandwidth aren’t free.
Jeff talks about AWS, and micropayment model.
70% work required to build a web based service is “undifferentiated muck”, Amazon know how to do such things well and are franchising that out.
AWS aims to provide the things you can take for granted.

Chad Dickerson - Yahoo Developer Network
Yahoo Developer Network’s aim is to have open apis for all yahoo products.
As a developer, a great thing about open apis is that you can play around with services you potentially want to use before comitting to them. Chad talks about business 2.0, putting up open apis and allowing potential partners to play - apps build on these apis typically get built before any bus dev integration is done.
Jonathan mentions that apis are just really function calls, interfaces and anyone is free to provide alternative implementations This is certainly something we’re thinking about with regard to the Talis Platform services.
The real value is in the data collection plus the open apis, both are less valuable on their own.

Marc Canter
The path to developing new open standards for the web is to implement hardwired interfaces first, then try to drive standards through adoption, i.e. a leader in a space makes their interface stable and published (i.e. flickr) then adhoc agreement leads to stds.
Marc’s company, Broadband Mechanics is the developer of PeopleAggregator - authentication interconnection of the various standards for federated identity - providing single signon to accounts with proividers like YouTube, myspace etc using authentication archectures from Yahoo, OpenID & Cardspace.
Import & export via open standards like xfn & foaf
Very important is that APIS have to be 2 way - data out/data in.
Marc wants to enable sharing between data aggregators, i.e. for me to do a review in Yelp and send it to Amazon, very much thinking along the same lines as Talis.
Jonathan Hare: trust grows through time, requires identity to track behaviour through time & build reputation/trust.
Availability of anonymity is important - reveal the history, reputation stuff of identity without the actual identifiers.

Jonathan Hare: Expects personalised search + content delivery are some of the main kind of apps that could be build on an open identity architecture. The concept of anonymous indentity allows your history, behaivour and preferences to be shared to chosen partners.”most things that are interesting require a degree of privacy”

Random points from the workshop

  • Marc Canter: Caching is the answer to latency. Users of PeopleAggregator can configure caching per service.
  • Jonathan Hare: REST is important, if you follow those constraints/patterns it will scale inifiinitley. Easy to do for publishing (i.e. the primary domain of the web). No vocabulary for policy, authentication etc but the patterns of the web DO work for scaling. We’ve now reached the point where the web is pervasive enough that scale is the issue.
  • MC: “I don’t believe in centralised social networks”, the trick is interconnecting many disparate networks in an analog of the publishing web.

ISWC2006 Pt II

Last night Ian and I met up with Harry Halpin, chair of the GRDDL working group and hit downtown Athens. First up, some grub at The Last Resort Grill with a party of semwebbers including Guus Schreiber and Fabio Ciravegna then on to the nearby Transmetropolitan. A good night was had by all with plenty of carousing and yacking. Harry had to have his good head on in the morning, as he was presenting a paper on the dynamics and semantics of collaborative tagging, based on research using Del.icio.us data at 8.15 in the Semantic Authoring and Annotation Workshop. Lucky chap. Fortunately, the paper was well received and Harry’s natural ahem projection seemed undiminished by the late night.

So, Ian and I are now in San Francisco for web 2.0 summit and I’m pretty knackered after a 5 hour flight, so its off to bed soon. We kick off tomorrow with workshops - The Next Internet Infrastructure, Radical Context: New Search for the New Web and Whose Data is It? sound pretty interesting.

Relevancy Ranking For RDF

ReConRank: A Scalable Ranking Method for Semantic Web Data with Context (pdf) Aidan Hogen, Andreas Harth, Stefan Decker
This paper presents a way of transforming the results of a text query over a set of indexed RDF data into a directed graph and making it suitable for ordering using PageRank-like relevancy ranking. The cool thing here is that the ranking is done at query time, not at index time which means a) there’s no need to re-index to change ranking scores b) it can handle arbitrary RDF data, no upfront knowledge of any schema is required. Basically, the index search retrieves a set of resources from which a topical subgraph is derived. This is combined with the named graphs in the dataset from which each resource is described to effectively imply a set of quads. This combined quad graph is boiled down to contain only resources and the links between them, maintaining indications about which resources are content and which are context (some may be both of course). Finally, further links are inferred into the graph by propagating links between upwards and downwards between the content and context layers. The links within the resulting graph is then analysed and two ranking tables are built; inbound links to content resources informing result ordering and inbound links into context nodes relating to provenance, i.e. the more links into a named graph, the more valuable its content. I like the idea of this and would love to try out some similar stuff in Bigfoot.

On a side note, Danny was dismayed to see that the SWSE search engine that Andreas et al have built using ReConRank and labelled “the George Foreman Grill of search engines” has beaten him to the punch somewhat.

OWL as Constraint Definition

Adding Constraints to OWL, Boris Motik University Of Manchester
Compares database constraint definition with OWL as in lots of cases we want to use owl like db constraints, rather than solely for inferencing.
Interesting idea as OWL provides a very rich schema definition language such as hierarchies/memberships/restrictions etc.
Proposes a notion of extended OWL Knowledge bases where the TBox is compartmentalized into Schema and Constraints, constraints operating under a closed world type assumption, where the absence of a fact from the KB means an invalidated constraint.

This is an something we’ve explored a little, although on a much simpler level using RDFS and OWL constructs alongside schema annotations to describe constraints on classes and datatype properties. What we were aiming for was to enable some basic constraints to be derived from any existing RDFS schema, but to also allow schema to be annotated with additional constraints where required. We characterised our solution as Closed World Consistency Checking, and basically it consists of a vocabulary for annotating the RDFS schema with simple constraint definitions, plus some simple code to validate RDF instance data according to the annotated constraints and some others implied by existing RDFS/OWL constructs (like cardinality etc). The implementation of this is conceptually quite similar to the divided TBox approach.

ISWC2006 Athens, Georgia

Day one of ISWC2006 and we’re in the Scalable Semantic Web Knowledge Base Systems (SSWS 2006) workshop.
The workshop format means that a lot of information is compressed into a relatively short time, so I’ve only managed to digest selected bits so far and jot down a few rough thoughts about some of the presentations for which I’ve managed to decipher my notes.
This is my first time at this conference has quite an academic slant and my initial impression is that it seems to have the semantic bases covered, whilst being a little light on the web. It’ll be interesting to compare and contrast with Web2.0 later in the week.

Georgia on my mind

We’ve just arrived in Georgia for the second leg of our whistlestop tour of the US, 2 days at ISWC2006. We flew into Atlanta, picked up a car and drove out to Athens via I-85 & GA316. I took a few miles to get used to the automatic gearbox and the driving on the wrong side of the road, but we managed to get here undented. We’re staying at the Georgia Center and we’re just off to have a quick scout around.

In Boston, for now

Ian and I are on a two week tour of the US, we’re spending a few days in each of Boston, Athens, San Francisco and finished of with a day in Washington DC. While we’re here we’re attending two conferences, ISWC2006 in Athens, Georgia and Web2.0 in San Francisco and trying to meet as many interesting people as possible and talk to them about what we’re doing, what they’re and whats going on in general. Now we’re in departures at Logan Airport, waiting to board our flight to Atlanta, from where we’ll drive to Athens ready for ISWC2006 tomorrow, so its time for a brain dump.

We arrived in Boston early Wednesday evening and headed straight out to meet up with the Harvard chapter of Wikipedians, including Aaron Swartz of Reddit fame, SJ and Nicole. We were pretty tired what with the time zone changes and travelling all day so we didn’t stay out long, just saying a few hellos really before heading back to the hotel to crash out.

Thursday.
Met up with Ben Adida at MIT, ostensibly to talk about the Talis Community Licence and if/how it meshes with what Creative / Science Commons movements are doing. Ben was pretty enthusiatic about the TCL (’though I’m not a lawyer’) and it was great to get postitive feedback about this because we think it could be a really useful tool for us as we try to open up access the data we know is out there.
Ben’s also one of the main guys behind RDFa, and we spent a really productive hour or so chatting about eRDF and RDFa, joined for a while by Ralph Swick. Its looking to me as though the two approaches are really beginning to converge, which has got to be a good thing. Another thing that came out of the discussion, and which I think has the potential to be huge was a small alteration to the RFDa spec which would seem to remove the XML dependency, making it viable to embed RDF in plain old HTML. w00t!
After lunch we a headed few blocks over to IBM to meet up with Elias Torres and the Semantic Web team. Elias, Wing, Lee, Robert, Ben and Rouben showed us some really cool stuff but I’m not sure to what degree its public yet (apart from Queso of course), so I don’t want to say to much. Suffice to say that there’s going to be some very interesting things coming out of that group in the not to distant future. More than anything, I was really impressed with the whole ethic of the team, its amazing what you can do with a bunch of smart and really focused people and it was nice to put faces to some familiar names from #swig.

Friday
This afternoon we took a cab out to Burlington (about 10 miles from Boston) to meet Susie Stephens. Susie is a Life Sciences expert at Oracle, and a big Semantic Web advocate in the company. She’s also co-chair of the newly formed (forming?) Semantic Web Education & Outreach working group at the W3C. It was a kind of impromptu meetup, so we didn’t have too long to talk, but we all agreed that SWEO is coming just at the right time, and its really important now to start getting some takeup of semweb technologies and ideas. Susie’s also going to be in Athens this week, so hopefully we’ll bump into her again.

I’ve enjoyed the little bits of Boston and Cambridge that we’ve managed to see. Its a pretty cool area and certainly different from anywhere else I’ve been in the States, mainly I think due to its age. Its true what they say about it being architecturally a lot closer to a european city than places further west.

WS APIs

ISBNdb.com
43Things
Upcoming.org
Simpy
Feedmap
Webjay
Del.icio.us
flickr
AWS
Google
Digital Podcast

Numbler
OpenStreetMap
linkaGoGo

Creating ChangeSets

One of the hard problems we need to solve now is how to generate the ChangeSets from our Domain Objects. What I would ultimately like to happen is that there would be a service out there somewhere that I can push RDF/XML to (in an Atom stylee) This service would be clever enough to pull out the CBDs from the graph, see if they exist in a TripleStore and generate ChangeSets for all of the pertinent resources in the XML. If the goal was just to roll back the store to a given point in time this wouldn’t be so tricky, each update to the store would just be modelled in a ChangeSet (which could then be Applied/Reverted). Where the problem lies for us is that we need to be able to roll back the representation of a DomainObject that happens to be represented in our store. To split the changes into meaningful chunks that can be Applied/Reverted independently is tricky. Should we try to decompose the XML received into CBDs and generate a ChangeSet for each of them? Possibly. I think for the time being (time constraints etc) we may take a simple, less service-y route. Our Domain Objects may be self-aware, meaning that they know if and how they’ve been changed during their life. In this scenario, a ChangeSet can be generated using the DomainObject’s inbuilt diff knowledge. It makes for a much tighter coupling between the ChangeSet stuff and our application, but I think in the short term it might be simpler to implement.