The next generation web, scaling and data mining will matter

We are all enjoying the benefits that come with the commoditization of existing hardware and software infrastructure.  It is true that it costs exponentially less to launch a business today versus five years ago.  We are all smarter, broadband penetration is reaching critical mass, and open source and commodity hardware have become reliable alternatives to proprietary architectures and closed systems.  As we all move forward with our web-based operations, it is clear that scaling the back-end infrastructure still remains a formidable challenge.  There have been many an instance of popular services going down – remember Typepad, Salesforce.com, and del.icio.us as a few examples.  With scaling the backend also comes a need to learn more about your users and their interactions.  Data mining and analysis is becoming a big thing to not only help companies create better services but also to generate more revenue per user.  In addition, for many web companies extreme data driven applications are the core of their services.  Think about Zillow, Technorati, and services like Indeed which are dynamically driven services based on aggregating, crawling, and filtering millions of pieces of data.  However, the fast growth of many a web-based operations combined with the need to mine the data leaves a big hole in the revolution of the cheap.  Web-based operations need an open source way and cheaper option to scale their database needs, move to a data warehousing architecture without breaking the bank, and scale with user growth leveraging commodity infrastructure.  Enter Greenplum (full disclosure-Greenplum is a portfolio company and I am on the board) which just released its GA product Bizgres MPP for data warehousing leveraging the best of the open source PostgreSQL database.  We have been working on the code for the past 18 months, and I am quite proud of the team for having delivered the release.  Greenplum is taking the best of the open source database PostgreSQL and rebuilding some of the core functions like the query optimization, execution, and interconnect.  We are allowing anyone to build a shared nothing architecture ala Google to scale their backend to multiterabyte sized systems leveraging cheap hardware. It is free to run on a single machine but if you want to run a massively parallel option we charge a fee per CPU.

Dana Blankenhorn from ZDNet gets it:

This is a problem a lot of Web 2.0 start-ups like Technorati, Bloglines and Flickr are facing, and projects like Drupal will face soon. They were built with open source tools, but then find they need to "graduate" to something like a data warehouse.  And there’s old Oracle, telling them there’s nothing from an open source supplier that can deliver what they need. Share with us, they say, you don’t have any choice.

Well, now there is a choice. Greenplum CTO Luke Lonergan said that O’Reilly Media, one of Greenplum’s early customers, graduated from mySQL to PostgreSQL with Greenplum and got a 100% 100 times improvement in database access speed across a 500 Gigabyte database. Other Web 2.0 start-ups, and projects, can do the same thing.

"The price of conversion is where the pain is," said Yara, "but look at how fast some of these projects grow."  While mySQL was smart in building on a lightweight Web base, more and more users and projects will find the need to graduate, and face proprietary FUD from major vendors saying they have to pay the "monopoly tax" in order to grow.

I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users.  Those that can do it cheaply and effectively will win.  The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don’t want to break the bank nor wait for Godot to deliver results.  Given these trends, I suggest downloading Greenplum’s Bizgres MPP and let me know what you think.

Welcome GreenPlum and Bizgres

I have looked at a number of open source projects over the last year and mostly agree with Bill Burnham’s comments that many of these open source plays are "marketing gimmics for startup companies."  Many of these companies are trying to start a new project from scratch, hoping to build a community brick by brick.  In addition, without the ability to create a community, it is hard to build a real sustainable revenue model.  Finally, open source does not matter if there is no customer need for the solution.  That being said, I am quite excited about the relaunch of one of my portfolio companies, GreenPlum, which is bringing the power of open source to enterprise business intelligence.  (Stop reading if you are not interested in a pitch for a portfolio company)

Quite simply, Greenplum is using an open source database optimized with supercomputing architecture to bring terabyte scale datawarehousing to enterprises.  Leveraging this architecture, Greenplum will be able to offer significant price performance benefits over existing BIG IRON solutions.  In addition, Greenplum is working with Josh Berkus and the PostgreSQL community to launch a new project, Bizgres, whose goal is to build a complete database system for BI exclusively from free software.  From a business perspective, what I like about our strategy is that we are building off an already existing and strong community of PostgreSQL developers.  Secondly, rather than pursue a broad platform play for all databases, we are focusing on a large but focused market in BI.  We believe this is a great way for open source to enter the enterprise as the market is riddled with expensive solutions, BI is a top 3 initiative in most enterprises, data is growing like a weed in most places, and because we are not asking CIOs to bet their transaction systems on open source.  Finally, our revenue model is not based fully on a support/service play.  The open source DeepGreen product will target small-medium sized businesses or anyone with data marts and reporting apps in the 10-300 gigabyte range.  GreenPlum will sell licenses for any company that wants to to deploy the DeepGreen MPP product to scale to multi-terabyte environments.  While it is yet another spin on open source, I am quite excited about what GreenPlum is doing and truly hope that by leveraging the success of PostgreSQL, staying focused on a targeted market, and employing a dual license model that the company will be able to rise above the noise.  As I have mentioned in a previous post, one of the clear benefits of open source, especially if you leverage an existing community, is to reduce the friction in the sales and marketing process.

Linuxworld Boston

Last year at this time, I was at Demo in Arizona watching a couple of my portfolio companies launch new products and networking with other VCs and entrepreneurs.  Given my travel schedule of late, I decided to go to Linuxworld in Boston for a day and follow Demo from many of the bloggers like Jeff Nolan.  It seems that the consensus view from Demo was that there were lots of interesting products but nothing that blew the audience away.  I, too, can say the same about Linuxworld.  After a few meetings in the morning, I decided to walk the expo hall to see the various offerings.  I saw my fair share of companies that sold into the high performance computing (HPC) market with various clustered file servers, data replication, and workflow application software.  I also saw a number of companies offering tools to better manage deployment and performance of Linux boxes.  Then there were a few companies selling enterprise applications like document management platforms and antivirus and antispam software on Linux-not terribly exciting.  Finally, there were various companies going after the desktop Linux market with operating systems and applications-while I found some of them intriguing, it is still quite early. 

One area I did like was the market for software compliance.  As we move to a componentized world where developers increasingly build in pieces of software from a variety of sources, how does a company know what they are using and from whom and more importantly what the licensing rights are for those components.  2 early stage companies going after this space are Palamida and Black Duck software.  I had a chance to speak with one of the founders of Palamida, Theresa Bui Friday, and came away quite impressed.  The Palamida software works like an antivirus scanner looking into code and checking against its compliance database to catalog your code base, identify whose components you are using, and then providing the user with the associated license and contact information.  Increasingly IP compliance is becoming a big deal, especially when you talk to CIOs, and incorporating this type of automated scanner early in the development process can save customers a ton of headaches and potential dollars from law suits.  I view this market as part and parcel with the source code scanning market.  Increasingly, secure coding is being built into the QA process and companies are coming out with automated scanners to check for vulnerabilities before products go to GA.  According to Reflective and NIST (full disclosure I am an advisory board member) it costs less than $0.10 to scan code early in the development process and up to $1,000 per line of code once a product is in GA. 

Developers matter in enterprise sales-just reach them economically

Jonathan Schwartz from Sun has a good post about the nature of developers, and why building a relationship with them is key to creating opportunities.

One of the smartest software execs I’ve worked with had a saying, Developers don’t buy things, they join things."  That’s been a pretty focusing statement for us over the years, and as we enter the new year, you should expect 2005 to be one in which we place an ever heightening focus on our dialog with the community, and the developer community in particular.  And not simply maintaining the dialog we have today, but finding new constituencies, and expanding our reach.  Establishing a relationship with a developer is all about starting a conversation – one that always flowers.  And often into opportunity.

I totally agree with Jonathan on this.  The key, however, for any small company is to do this economically and efficiently.  Let me give you an example.  Let’s face it-many companies selling into enterprises end up going through some "pilot" or "beta" period where a sales prospect’s developers and technologists get to use the software and deploy it on a trial basis.  When I look at a sales pipeline, I always want to know who in the organization the company is selling into and why.  You see, I have more often than not seen a number of early stage companies selling into enterprises but not selling high enough to the people with budget.  In other words, the vendor ends up getting excited about the number of pilots in the market, many of which are with technologists who by nature like to try things and rarely end up buying.  The vendor spends an inordinate amount of time reaching out to the developer or technologist to set up a pilot and then leaves with no defined criteria on when the pilot ends and how it automatically converts into a sale.  The developer uses the product, sucks up lots of our resources, and moves on to the next new technology.  While it is imporant to court developers and technologists in the sales process since they typically have to give the technical buy-off and can just as easily squash an opportunity, it is not a great and economical use of time to have your most expensive direct sales resources and sales engineers doing this. 

Enter the web and the open source movement.  Sure, "try before you buy" works if your users can download the software for free either on a trial basis, say 90 days, or if you open source a version of your product and build a real community.  One of my portfolio companies is laying the foundation and groundwork to open source some of its software to help build a community and buzz around its product.  We know that developers and technologists are key to the sales process. We want developers and techies to download and use the product and bang on it.  However, we just want to reach them in an easier, more efficient way.  Why have our most expensive sales resources do this when we can leverage the web?  We want to build community around the product, gather great feedback, and land and expand our relationship with the developer.  We hope this open source strategy will work as we build a relationship with the developers who ultimately will drive decision making from the bottom-up while our expensive sales reps can reach the execs with budget from the top-down.  Hopefully, the two ends will meet in a selling process with less friction.  We shall see.  I will keep you posted as this experiment evolves.

Open source and software licensing

It seems that SCO is making another attempt to hurt the open source movement by claiming that the GPL is unconstitutional and violates federal patent and copyright laws.  While many are not concerned and call this a publicity stunt by SCO, the discussion of open source software licenses does remind me of a panel that I recently saw at the Goldman Sachs Software Retreat 2 weeks ago.

On the panel you had representatives of RedHat, MySQl, and JBoss combined with the perspective of a large IT buyer, the CTO of Goldman Sachs.  While I will not fill you in on all of the gory details, one thread did stand out in my mind.  It goes like this:

It seems that many of the bigger open source players are building out their own stacks ala Microsoft and others in the pursuit of growth and profits like traditional closed-sourced software companies.  Isn’t this the antithesis of what open source stands for?  Rick Sherlund, Goldman’s software analyst, says that it makes sense from a financial perspective since it allows vendors to cross-sell and lock-in the customer – customer retention is a good thing after all, isn’t it?  While all of the open source players did their best to dodge this question and claim that they are really open, MySQl was the only company that really seemed credible here as its goal was to be part of everyone’s stack, including the Microsoft .NET one.  JBoss and RHAT clearly seemed to be building their own middleware and open source stacks while at the same time claiming an open architecture. 

The interesting point was served up by Michael Dubno, CTO of Goldman Sachs.  He specifically told the vendors that the danger of the open source stacks is that it does create lock-in and that open-interoperability is what is most important to him.  He will go somewhere else if the open source guys end up limiting his options-he needs great service not extra features.  Moving on, he points out that the biggest gaiting factor for him in terms of adopting open source is making sure the legal issues will not come back to haunt him.  Goldman reviews every license agreement and makes a determination of which licenses make sense and which do not.  What Michael wants is integration from a legal perspective, not a feature perspective.  He claims the biggest cost to Goldman is not 2 products, but the cost in service and supporting 2 different contracts-he wants more standardization of contracts.

I found this to be an interesting point. I have seen a number of open source related software plays and it seems that many are trying to create their own unique twists on licensing.  While Goldman’s CTO is one data point, I would encourage companies looking to open source some of their software to try not to be too cute and design their own unique open source license but rather look to leverage existing ones like GPL.  One of the biggest barriers to a large enterprise using your software will be the software license itself.  The other point is to not forget why lots of companies are using your product in the first place – be open!

Innovation is not dead

Here is another example of why commoditization is not killing innovation. In fact, it can and has given a number of companies a leg up in terms of developing and deploying new products in record time and at low costs. Using so-called commodity software and hardware actually does not kill innovation but speeds it up.

For example, Metapa, one of my portfolio companies, has begun shipping a software product (Metapa clustered database) that enables customers to deploy terabyte scale data warehouses on clusters of commodity computers running open source software. To that end, the company just announced a joint customer win and partnership with Sun.

In the press release, Jeff Mayzurk, VP of technology for E! Networks, says:

“Deploying a unified data warehouse has always been a strategic goal of E!, but with the total cost of ownership associated with traditional solutions, it hasn’t been practical. Metapa and Sun provided a truly unique solution allowing us to implement an enterprise class data warehouse with the price/performance level that makes our initiative possible.”

Dave Powell, CEO of Metapa, goes on to say:

Metapa and Sun are excited to announce E! Networks as a joint customer and a flagship example of how companies can capitalize on the performance advantages and operational returns of open source and commodity computing for data warehousing,” said Dave Powell, president and CEO of Metapa. “CDB leverages commodity computing, open source database technologies and breakthrough parallel processing algorithms to deliver unprecedented price/performance when compared to traditional, proprietary database solutions.”

To reiterate, commodity computing and open source software can enable breakthrough solutions such as what Metapa is delivering with Sun X-86 hardware. My hats off to the team at Metapa for making this happen. In addition, I love having an early customer win that is referenceable and with a partner that can help replicate this win in a big way.

On technology commoditization

If you ever wondered how Sun monetizes Java, I suggest reading Jonathan Schwartz’s (President of Sun) post on commoditization, standards, and Java. The crux of his discussion is that standardization and commoditization is not terrible as it inevitably opens up new market opportunities for industry players (just look at the railroad industry as an example). On the tech side, Jonathan believes it is mainly bandwidth that has been commoditized as opposed to a broader trend in software.

So I’d like to answer once and for all the question, “how does Sun monetize Java?” with a historical reference: the same way GE and General Motors have monetized standard rails, Vodafone monetizes GSM, banks monetize ATM networks, and oil and gas companies monetize the fact that my car can use “gas.”

The Java community, which we steward, drives a broad array of platform standards, among an even broader array of industry participants. That activity levels a playing field, that just so happens to be the single biggest playing field the technology industry has ever seen. The network is a commodity. We should all be celebrating.

In some respects, one could view commoditization as a bad thing as it is difficult to differentiate one product from another as they are easily replaceable based on price alone. However, what Jonathan is saying and what I agree with is that it is what you do with the commodity bandwidth, standards, and platforms that separates the winners from the losers. Sure, companies are all on a level playing field due to advancing technology and platforms. For example, with standardization, building new software and technology products and integrating them with existing solutions takes much less time and costs way less than ever before. Despite that, we continue to see innovation and new business models. The value just resides in a different layer. While Jonathan would like to believe that the creation and promotion of Java would soley benefit Sun, his argument is that it makes the market bigger for everyone, including Sun, so that is a great thing.

The one thought that could cause worries is that if you buy into Jonathan’s story of commoditization, the inevitable result is that the industry will consolidate leaving only those with scale and monopoly power to survive. Just look at the examples from his post – GE, GM, Vodaphone, and banks have benefited from standardization. Well, those are all big guys. In my mind that’s ok, as consolidation will be a long time coming as we are in the very beginning of this commodity movement in the technology space. Sure certain markets are in more advanced stages, but overall as an entrepreneur and venture investor you will have plenty of chances to make your impact. Remember, as markets commoditize, new opportunities will continue to arise, huge ones that we never even thought of today.

Has the individual investor learned a lesson?

There have been a number of IPO fillings recently, but the one that intrigues me most is the filing by Lindows. As many of you have read, Lindows/Linspire just filed an S-1 to raise $57 million in an IPO. WR Hambrecht is the lead underwriter and will utilize its dutch auction methodology to raise money from individual investors. In my mind, what happens with Lindows will be a barometer of the psyche of the individual investor. It well tell us whether or not the individual investor learned a lesson from the bubble. It will tell us whether or not speculation will run rampant again. As you know, I do find Linux on the desktop intriguing. That does not mean that I believe this is the year and that you should go public now on $2.1 million of revenue in 2003 with a net loss of $4.1 million. On top of that, of the $57 million they are raising, $10 million is going to pay off Michael Robertson, the CEO, for a line of credit he extended the company over the past couple of years. As per the filing,

The approximately $10,400,000 of net proceeds that we intend to use to repay outstanding debt obligations will be paid to Michael L. Robertson, our founder, Chairman and Chief Executive Officer, as payment in full of all remaining outstanding amounts under a revolving line of credit. Mr. Robertson has advanced us funds under the line of credit since July 2002, including advances of $5,600,000 during 2004. Amounts borrowed under this loan are used for our operating expenses. The loan bears interest at the rate of 10% simple interest per year and matures on June 30, 2005.

So not only is this a speculative offering, but also one where the largest shareholder gets paid back $10 million off the top. Michael did pay $4.5 million for the shares that he currently owns but 2/3 of his total capital will be off the table. So how much skin in the game will Michael really have to make this company work? Does this sound like a good investment to you? I am not opposed to the dutch auction and do believe that the methodology has a place in some deals. My big fear is that if this deal does happen, it will only confirm my belief that the individual investor never learned a lesson from the bubble. For the individual investor to forget so quickly about all of the pain and suffering we just went through really scares me.

Linux on the desktop (Continued)

I have written about linux on the desktop in the past (here and here). Today, my partners and I installed the latest version of Xandros 2.0, and I have to admit we were blown away. It installed in about 10-15 minutes with a couple clicks of the mouse, and we had a full working version of a linux desktop which looked and felt like a Windows machine. It partitioned our hard drive so Windows and Linux could run on the same machine (if you really want it to) and allowed the Linux desktop to seamlessly interoperate with my Windows network. The file manager was just like Windows Explorer, and I could easily find, use, and set permissions on my old files. If you have not tried it yet, I encourage you to go to Xandros to buy a copy of the deluxe version ($89). The great news is that we were able to take an old laptop with a P133mhz chip and substantially improve the performance of the machine, extending its useful life. I am definitely going to install this on one of my old laptops at home. What is even more interesting is that with an integrated version of Codeweaver’s Crossover office, you can run many windows-based application seamlessly on your Linux desktop. Unfortunately iTunes does not work yet. Go to the site if you want to learn more about what other applications work. So the Linux desktop is here and much improved, and what is important is that it interoperates with Windows from a networking and management perspective, all very necessary when any enterprise looks at TCO (total cost of ownership). While I do not anticipate huge enterprise adoption this year, I definitely see less barriers to its adoption in the years to come.

Software co-op/software reuse

Lee Gomes from the Wall Street Journal wrote an interesting piece (sorry, not a free site) in his Portals Column about Project Avalanche which is essentially a software co-op for businesses to share their applications and code. Current members include Jostens, BestBuy, and Cargill. According to Lee, the Avalanche Project was started because the founders kept asking themselves the following questions:

“Why were they writing such big checks to their software companies, but getting so little in return? Why were their in-house programming staffs writing the same sorts of custom programs written at thousands of other companies? If Detroit car makers can collaborate on research, why couldn’t U.S. technology users?”

The project is in its early stages but has grand ambitions. One of the founding members discusses what would happen if the group banded together to create their own CRM system or their own Linux-based desktop environment, saving all of the participants lots of dollars on licensing fees. While the idea of software reuse is not new, as developers have talked about this for years, the implementation via a co-op is what’s unique. In addition, most of the other companies or sites that I have seen specialize in sharing snippets of code versus full applications.

If you are interested about software reuse, I encourage you to read up on a company I met with early last year, Artifact Software. Artifact Software has a tool that allows developers to collaborate and create a code sharing community. Its initial target market will be selling to enterprises, allowing their developers to collaborate internally to become more productive. However, its business model is to seed the target market with its tools by allowing users to download its product for free and share code via an open website at www.codejack.com. The website currently lists 33k artifacts of code with over 23k users. Leveraging the open source philosophy, Codejack is not only about searching and finding code, but also about testing, rating, and reviewing code. Other companies to keep an eye on include Component Source and Logic Library which is more enterprise-focused. While developers have been talking about software reuse and its ancillary benefits for years, I have no doubt that given a tough climate for IT spending and the acceptance of open source, that the idea for software reuse and collaborative development will become a big topic again. In the long run, I am sure that the members of Project Avalanche will contribute and develop some interesting software.