This post is not about software but hardware. Well, companies that create software to generate hardware that is programmed with software…and why. And how they evolve back to just hardware over the lifetime of the company.
The clear vision is that the way ASICs are designed is too rigid and time consuming for the rapid succession of products and standards in the market. Think of wireless standards and video compression standards. Once a new tablet or smartphone hits the market, the standards have evolved already. What used to be MPEG2 turned into H.264 into Google VP8, etc.
Hence software is key both for flexibility but also to reduce hardware cost of supporting many different standards in one device. Yet the demands on cost, power consumption, and performance requires the processor to run this software to be very efficient. The way this is solved is by a custom instruction set that is tuned to the domain at hand. Intel SSE- or AVX-like instructions for image processing for example.
These kind of high-tech startups usually start in academia. Studying several of these companies in the Electronic System Level (ESL) design space, I came to realize that most go through similar stages and pivots over the course of their existence.
Some examples are EZchip’s NPS, Netronome, Tensilica, Kalray, and to some extent Xilinx and Altera. Brave entrepreneurs that looked at how sophisticated SoCs are made and decided to “make things better”.
With any programmable processor comes a toolchain, consisting of a compiler, assembler, debugger, profiler, and simulator to try out software on the host PC. The availability and quality of these tools defines the efficiency of the processor and, more importantly, the adoption of the processor in the market.
But with a different custom processor for each domain you need to develop and maintain a plethora of solid tools. That doesn’t scale too well.
Enter the processor generation framework. Generate a custom toolchain and processor from a generic template with the click of a button. Great technology.
Why does this model not work in the market more often?
Unfortunately, customers typically are looking for short-term solutions, as opposed to visionary technology.
Nice technology, but can’t you just sell me the solution?
Customers want to buy a ready-made processor IP and toolchain for their specific domain. Why spend the effort of learning new tools if you can just buy the end result that solves your problem?
Few companies manage to attract a large-enough community or big brother in the ESL space to survive, with Tensilica as the exception to the rule. ESL startup LISATek pulled this off through an early exit to CoWare and subsequently to Synopsys.
Most will need to pivot. Use the generation technology internally and sell the generated IP. To have a 100% product-market fit, the startup needs to restrict to a particular domain and focus their sales and marketing. Following Geoffrey Moore’s model of winning one niche at the time with a 100% product for each niche.
What’s the customer’s reaction? Programming the IP is generally non trivial. Programmers need to tune the code to use custom operations to benefit from the performance potential. They also have to explicitly manage memory allocation, deciding in which data structures to place local to the processor and which can be further away without sacrificing performance. Unless the hardware and compiler are designed to do this automatically, holy-grail style, this is a daunting task. Compare to writing GPGPU code in Nvidia’s CUDA or OpenCL. It may take a few days or weeks to get working code, but practice shows it may take years to optimize the code to the underlying hardware architecture and reach the desired performance.
How to get out of this jungle? EZchip bought programmability with their acquisition of Tilera. Intel went through great efforts to keep their Xeon Phi programming model and tool support as close as possible to the known x86 model. The French manycore company Kalray took on the challenge to extend their dataflow programming model to OpenCL, OpenMP, and POSIX threads to cater to each customers’ wishes. Altera got on the same bandwagon in supporting OpenCL and OpenACC for the high-performance computing market. Xilinx’ took this on by providing OpenCV kernels fully tuned for their Vivado FPGA tools. Fully domain specific as a necessary evil.
Bottom line, the customer will demand that you write the required software libraries and ship these as part of the product. How else to compete with the huge amount of available software of industry’s mogul ARM?
Competing with the world on porting all possible software libraries fully tuned to your hardware is reserved to major-league players only. So you pivot once more. You provide black-box solutions for one specific niche. You generate the IP internally and have a full software team somewhere in Asia create the accompanying software. You have now turned into a veritable software house.
This is usually the point where the startup gets acquired,. With an immediate solution for a short-term need. Investors happy. Founders happy. But what happens after acquisition?
Now your startup is part of a large corporate, churning out HW/SW solutions at a regular pace. Beating the competition with their superior processor and custom, fully tuned software libraries to match.
Enter big-company dynamics. In startup phase, it was all about getting to market quickly. Now it is about integrating with company policies, processes, and tools. What about design-for-test or integrating to standardized tools? How easy is it for a remote team to add features and debug the startup’s specific design flow? How long does it take a new hire to be productive in the custom technology?
Once the founders have moved on to new adventures, the big company will force the team to align their design flow with its big-company processes. Despite the superior technology, the managerial argument of “we should stick to what the rest of the world does” quickly prevails. So the company moves to established solutions from the likes of ARM. But as the cost, performance, or power-consumption budget is further tightened—before you know it—you are back to where it all started: custom hardware design. By hand.
In the end, the original quest for low-effort processing solutions remains. How to overcome the catch 22?
About Martijn Rutten
Fractional CTO & technology entrepreneur with a long history in challenging software projects. Former CTO of scale-up Insify, changing the insurance space for SMEs. Former CTO of fintech scale-up Othera, deep in the world of securitized digital assets. Coached many tech startups and corporate innovation teams at HighTechXL. Co-founded Vector Fabrics on parallelization of embedded software. PhD in hardware/software co-design at Philips Research & NXP Semiconductors. More about me.