The Ongoing Data Revolution

In early 2006, I wrote a post titled "The next generation web, scaling and data mining will matter."  In it, I highlighted some thoughts on the future:

I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users.  Those that can do it cheaply and effectively will win.  The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don't want to break the bank nor wait for Godot to deliver results.

Ok, clickstream data and data mining sound kind of geeky, but with every one of our calls, clicks, and purchases being tracked and logged it is an important topic.  My friend Scott Yara, co-founder and President of Greenplum (full disclosure-my fund is an investor), wrote an interesting post the other day which is more mainstream calling this ongoing revolution the "Your Data" revolution.  

The point here is that data can be a wonderful thing if used the right way and if controlled by us.  For example, Scott points out;

Your Data can lead you home with turn-by-turn directions on Mapquest. It can find you love by sorting through the profiles of 20 million other lonely hearts on eHarmony. It brings you up-to-the-second stock prices, sports scores, and flight delay alerts. It helps doctors fight diseases and engineers design safer cars. It gives environmentalists the power to track the movements of endangered animals and biologists the tools to map the structure of our genes.

Your Data, in short, is transforming everything.

However, with all of this data comes great responsibility and opportunity.  As Scott points out:

We also need to make sure we can use all the information we're collecting. That means better schools that will turn out kids who are able to cope with the age of Your Data. And we need better, cheaper technologies to enable companies of all sizes, as well as organizations and individuals, to get all the information they want and do something useful with it.

Knowledge is power, and we know more than any previous generation could even conceive. We're moving into a world of infinite information. The challenge we face is turning all that information into insights, conclusions, and revelations — in other words, turning that knowledge into wisdom, without letting it be turned against us. We need to make sure Your Data doesn't oppress us, but serves us. And we need to do that fast, because the revolution is well underway.

From a VC and entrepreneurial perspective, what excites me is that we are just scratching the surface of what to do with all of this data and how to turn it into actionable, meaningful insight.  In order to make data and insight more accessible to everyone we first need the back-end technology that makes data storage and analysis better, faster, cheaper (enter companies like Greenplum-ok, shameless plug đŸ™‚ ).  We then need great entrepreneurs to continue to build new services that help end users seamlessly and implicitly help everyone make better decisions, discover new things, and empower and motivate us to do more.  In addition, we also need to consider cultural factors.  For example while privacy still needs to be at the forefront of the Your Data revolution, we also need the ability and power to choose what we want to share and when with the world.  Little did we know that four years ago, more people than ever would be willing to share their whereabouts through services like Loopt or Foursquare or Twitter and their every thought through Facebook or even their credit card purchasing data through new services like Blippy.  It is clear that the once sacred walls between private and public information are increasingly disintegrating based on these cultural factors. While we clearly have to be careful not to extrapolate too much from early successes like Blippy and Foursquare, we also cannot underestimate the power of these cultural factors as once young start-ups like Facebook and Twitter have exploded in growth.  The question is who will create the next great back-end technologies and new web services that drive a whole new conversation and new way of thinking about what we do with the data that is around everywhere.

Going old school – how to reach people effectively

I had lunch with a friend last week when we were talking about the days years and years ago where it was cool to have an email address on your business card.  In fact, I remember picking attorneys to work on our venture deals in the mid-90s not only based on cost and experience but also based on how digital they were – no AOL email addresses please and if you use IM, then great.  Now I can honestly say that I can be overburdened at times dealing with my email, IMs, sms messages, phone calls, LinkedIn and Facebook messages.  So I must say it was quite refreshing last week when I received a hand delivered note from Robert Samet who runs Madison Search Partners, a well respected boutique search firm for senior level sales searches in the digital media and software sectors.  He, of course, had sent me a few emails before that and also followed up with an email afterwards.  Robert went old school with snail mail and physical communcations and with that got my attention.  Yes this is an old marketing trick but one that sometimes gets lost in the shuffle of digital communcation.  I, of course, had to take his call and when we spoke I asked him how his campaign went.  His hit rate was quite high and given his creativity, he is definitely a guy I want to use in the future for a search.  So in this day of constant and immediate communication, physical mail and snail mail can sometimes leave a lasting impression.  As for myself, I actually got some personal stationary last year to send note cards to friends and business contacts when I want to make sure that I deliver a more effective message.

The next generation web – scaling and data mining matters (continued)

I had some interesting meetings yesterday and as I reflected on them this morning, one common theme emerged which is that the next generation of the web will be built on data mining and extracting intelligence from the reams of data web services collect on a daily basis.  This reminds me of a post I made in March of 2006 titled "The Next Generation Web – scaling and data mining will matter" where I mention:

I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users.  Those that can do it cheaply and effectively will win.  The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don’t want to break the bank nor wait for Godot to deliver results.

My first meeting was with a well known research analyst covering Internet stocks.  While we discussed the usual topics such as how the Internet was taking share from traditional advertising budgets and how the top brand advertisers have not really embraced the web yet, our most lively discussion centered around next generation advertising technology which all centered around increasingly complex forms of data analysis.  To that end, I mentioned one of the fund’s portfolio companies, Peer39, which is using natural language processing and machine learning to create highly precise matching of commercial offers and user generated content.  As you might guess, the secret sauce is the algorithms that the company has created.

Later in the day I had lunch with a friend who we had funded years ago.  What was interesting to hear was how many of the future product lines that we discussed a few years ago were finally starting to emerge as real revenue drivers for the business today.  Years ago the company’s first data center cost around $20mm and the latest one which has orders of magniture more customers cost only $3mm.  Clearly, any data-driven opportunities a few years ago were cost prohibitive in the first place and too early for the customer to understand in the second place.  That was the case because many businesses were just worried about not getting Amazoned and today they are all on the web thinking about how to drive better results.  That is why our discussion led to a massive data warehousing project his company was working on to take all of that data across his huge customer base and to help them better monetize their sites.

What I love about these kinds of opportunities is that algorithms scale, have high gross margins, and are proprietary and defensible.  The next generation web is not about what you click and see but what is happening behind the scenes every time you click on a page and move from site to site.

The "free" business model

Chris Anderson does a nice job of summarizing the rise of the "free" business model starting with the Razor/razor blade to the world of the web where he argues that all services eventually get priced at their marginal cost. And as Chris rightly describes, that price is quickly going to zero in a world of technology where Moore’s Law continues to hold and where storage costs are declining rapidly. 

Among the many great examples in Chris’ article, the one paragraph that stood out most for me follows:

There is, presumably, a limited supply of reputation and attention in the world at any point in time. These are the new scarcities — and the world of free exists mostly to acquire these valuable assets for the sake of a business model to be identified later. Free shifts the economy from a focus on only that which can be quantified in dollars and cents to a more realistic accounting of all the things we truly value today.

In a world where everything is free, what is the most valuable asset?  I couldn’t agree more that "attention" and "time" are two scarcities that every company offering "free" services has to overcome.  There is only so much time in the day for all of us to join another social network, add a new widget, and try out a new web service. And this fight is not only for a consumer’s web time but for their overall leisure time – time to spend with their family, time for sports, and time for entertainment.  Given this competition for such a finite resource, you better have something incredible for me to try which will either provide awesome entertainment or provide an awesome utility that gives me a 10x improvement over existing ways of doing things.  Without that, I am sure you will get people to sign up and try your service, but I doubt you will have many active users 6-12 months down the line.

And my final point is that "free" is great and what consumers expect many times, but at some point in time dollars do have to come from somewhere whether it be venture capitalists (who will surely expect a big return on their investment), advertisers who will expect the same, or some other source of capital to sustain the business.  So in concept I agree with the notion that the world is getting cheaper by the second, but on the other hand don’t forget Chris’ points that free only means that dollars do eventually have to come from somewhere to pay the bills.  Oh yeah, one other point-as we move to this world of free, there will be lots of carnage and the road will be littered with many dead companies, as only a small percentage in a growing pie will be able to make this model work and viably consume your time and attention to deliver the money.

What a Microsoft Yahoo deal would mean for startups (continued)

What a great move by Microsoft! This has been floating around for awhile and the last time I wrote about it was in May of 2007. Anyway, I thought I bought at the bottom for Yahoo months ago in which case it fell another 25% from there. When I saw the news this morning I was quite happy to sell my shares and make a slight profit. As we all know when it comes to the Internet and advertising, scale matters. What this potential deal could mean for startups are two things. One, when Microsoft finally integrates its 3 or more advertising platforms with Aquantive, adcenter, and Panama, they may just be able to offer startups a decent or even better alternative to using Google Adsense to monetize their inventory. Secondly, that huge collective sigh you are hearing is one that is based on the fact that there will be one less independent multi-billion dollar acquirer for the thousands of startups out there. In fact, this integration could take awhile and take Microsoft out of the running in the near term as well. So if you are a startup depending on a quick flip, I would do what you were always supposed to do – focus on your fundamentals and figure out how to build a real business. In addition, given the uncertain economy, I would be very careful on ramping up your business too quickly unless you have the results to justify your growth in fixed costs. Moving on, it will truly be interesting to see how Microsoft integrates Yahoo and what parts of Yahoo it decides to sell like Kelkoo or kill like possibly Zimbra. All I know is that there have been lots of senior Yahoo resumes on the street so it will be interesting to see where they all end up.

Freelance web designer needed!

I am helping a friend who has the most popular books on baby naming bring it all online. Our development on the backend is close to complete, and we are looking for a great web designer who can create a few templates and themes for us. This is a database driven site and the homepage will include some social elements in it as well. If interested please send me a note with some samples of your sites you helped design. Simple sites we like include dictionary.com, urbanbaby.com, apple.com, and deliciousdays.com.

Openness and social networking

As you know, I am a huge believer in open standards and that open standards (over time) will usually prevail over proprietary, closed networks.  And my one wish from a social networking perspective was to really be able to manage all of my relationships from various networks and my interests from one meta-application.  In June this year I wrote about Linkedin and Facebook on a possible collision course.  And in November I wrote about the promise and potential shortcomings of OpenSocial:

OpenSocial is like Java for social networking apps-the promise of write once, run anywhere.  It goes back to my point I made in an earlier blog post – I am completely inundated now from requests from Facebook, LinkedIn, and now PlaxoPulse.  I am having a hard time keeping track of all of my contacts, messages, and the like.  It would be great if I could have a service that sat on top of these apps and allowed me to manage all of my relationships from one place.  Sure, some contacts may only be a Linked in contact, some may be a Facebook and LinkedIn, etc.  Check here if you want your music to be shared on this network and not the other one, etc.  You get the idea.

Of course, I thought that I was dreaming and that it wouldn’t happen anytime soon until I read the announcement today that Facebook, Google, and Plaxo joined the Data Portability Group (see Read/Write/Web).

The DataPortability Workgroup announced this morning that representatives from both Google and Facebook are joining its ranks. The group is working on a variety of projects to foster an era of Data Portability – where users can take their data from the websites they use to reuse elsewhere and where vendors can leverage safe cross-site data exchange for a whole new level of innovation. Good bye customer lock-in, hello to new privacy challenges. If things go right, today could be a very important day in the history of the internet.

The proof will be in the pudding and in the implemenation as it is with Google’s OpenSocial. That being said, this is a great move by Facebook, stemming the negative tide that was building about who owned their data and also, in my mind, locking them in as a defacto leader for years to come.  Facebook is the current gorilla in the space and gorillas do what they want.  However, rather than take on every other player who campaigned on the "open" platform, Facebook has thrown its hat into the ring, and I am sure will play a major role in helping make the standards as well.  Let’s just hope our data is really portable and only when users can run their social networks and share their data from one or any platform easily will we truly be in an open market.  That being said I agree with Marshall that this could be an important day for the Internet, one where the consumer’s voice truly carries weight and one where openness will prevail.

Picks and shovels for the web

We have had quite a resurgence in the web market during the last few years.  A number of great companies have come out of nowhere to become household names, and it seems that everyday we are inundated with news on another slew of new web startups going after the consumer.  And yes, looking for the next YouTube or Facebook or Myspace is exciting.  Depsite all of that, the one area is that is not discussed much is the boring infrastructure market where companies sell the picks and shovels to allow these startups to run their operations.  And what could be more boring than talking about a database or data warehouse?  Anyway, I am glad that Don Clark of the Wall Street Journal wrote a nice article on a new breed of startups going after the database market.  Shamelessly, I would like to add that he has a nice writeup on Greenplum (full disclosure: my fund is an investor and i am a board member).

Granted, the opportunity to make money selling picks and shovels during this web resurgence is definitely much harder as developers typically go for free and cheap software and hardware to launch their new companies.  That being said, every click that we make is being stored somewhere and the companies who can better analyze this data to better monetize their sites will be the winners in the next phase of the web.  This is where Greenplum comes into play.  The company is not only playing off of the data volume and analysis trend but also the move towards commiditization.  As Don mentions in his article, the secret sauce is that our customers can deploy massive data warehouses using our software which is built on top of the open source database Postgres and deploy it on commodity boxes.  The benefit is not only in terms of cost but also in significant performance increases over the competition.  As per Don’s article today:

One user is iCrossing Inc., of Scottsdale, Ariz., which provides analytical services to companies that operate Web sites. Analyzing a day’s worth of some types of data once took 20 to 22 hours, said Tony Wasson, the company’s vice president of engineering. With Greenplum’s technology, and some modifications to its own software, the job now takes about an hour, he said.

Anyway, it is nice to see the mainstream press finally getting the fact that data and analytics matter. Yes, plumbing is boring, but without cost effective platforms which can scale and perform under heavy stress, we won’t be able to reach the full peak of monetization on the web.

Thoughts on OpenSocial

Tim O’Reilly has a great post on Google’s OpenSocial.  At the end of the day, I couldn’t agree more with Tim’s thoughts that OpenSocial is great for developers but a who cares for users. 

If all OpenSocial does is allow developers to port their applications more easily from one social network to another, that’s a big win for the developer, as they get to shop their application to users of every participating social network. But it provides little incremental value to the user, the real target. We don’t want to have the same application on multiple social networks. We want applications that can use data from multiple social networks.

Would OpenSocial let developers build a personal CRM system, a console where I could manage my social network, exporting friends lists to various social networks? No. Would OpenSocial let developers build a social search application like the one that Mark Cuban was looking for?  No.

I agree Tim.  OpenSocial is like Java for social networking apps-the promise of write once, run anywhere.  It goes back to my point I made in an earlier blog post – I am completely inundated now from requests from Facebook, LinkedIn, and now PlaxoPulse.  I am having a hard time keeping track of all of my contacts, messages, and the like.  It would be great if I could have a service that sat on top of these apps and allowed me to manage all of my relationships from one place.  Sure, some contacts may only be a Linked in contact, some may be a Facebook and LinkedIn, etc.  Check here if you want your music to be shared on this network and not the other one, etc.  You get the idea.  It is not hard to view this data in one place by sucking in RSS feeds from the various services but viewing it in one place vs. managing all of my relationships from one place are two different value propositions.  Jeff Nolan has a recent post about this as well.  Of course the challenge is that the value of these services is their proprietary networks which creates lock-in for the user.  Once users can export and manage that data and without visiting these various platforms then the service begins to lose its lock-in. We see this problem over and over again in many web services – the constant battle between closed and open standards and networks.  If you are the big guy, why bother.  If you are the small guy, it makes sense to join up with many of the other smaller players.  Anyway, enough digression here – I would love to hear your thoughts about how you are spending your time managing your various relationships across different networks and what you would like to see.

The constant battle between revenue and usability

I am definitely the first one to understand that there is no free lunch on the web.  At the end of the day, someone has to pay for all of the great services and content out there.  To boot, I am a big believer in the ad-driven model of content.  To this point, there is a battle being fought at every web content company on a daily basis between product manangement, engineering, and ad sales.  What is clear on the web is that every little change can make a huge impact in terms of usability, traffic, and revenue.  The more you err on one side vs. the other can help companies make or break their numbers.  It is this battle between usability (simple and clean) vs. revenue (balance between getting what you want vs. being cluttered) that is constantly fought behind the scenes. 

Take Forbes.com as an example.  I have always liked the content but over the last 6 months I have basically stopped going to their site or any link that someone sends me from Forbes.  Why?  I cannot stand the in-your-face advertising and the clutter.  First, it starts with a big-ass splash page before getting you to the site and once there a Forbes.com video clip starts with a pre-roll ad.  Once I click on another page, I am confronted with another video ad that starts right away.  Once again, I like the writers but honestly this site has become too revenue focused and consequently too cluttered.  As a user, I feel like I am spending more time dealing with turning off video and audio ads and skipping splash screens rather than reading content.  I am sure the Forbes.com business folks have done their analysis between lost unique visitors versus more revenue per page, but in the long run striking the right balance between usability and revenue is key.  And as I sit down with my portfolio companies, it is also this balance that we all seek to achieve because we understand that what we may gain in short-term revenue increases may hurt us in the long run if our audience base declines over time.