RapidMiner is a well known Open-Source Data Mining Tool from company Rapid-I, and is in use many thousand times all over the world. At CeBIT I had the opportunity to talk to Co-Founder Ralf Klinkenberg about his software and get some interesting information, for example if RapidMiner is ready for Cloud Mining.
RapidMiner, formerly known as YALE, has been developed at the German university of Dortmund, beginning in 2001. Since then it has definitely proved its impressive functionality, I for myself used it the first time for a Data Mining contest in 2006 (being quite successful). Meanwhile it is hosted at the open source developing platform sourceforge and is also developed further on this site. Right now the 5th version is available.
Rapid-I provides Enterprise Editions
Out of the necessity from many companies to reduce the common open source risks, and to provide a business partner that can give support, the company Rapid-I GmbH was founded by the developers of YALE. Here CEO Dr. Ingo Mierswa, CBDO Ralf Klinkenberg and their co-workers distribute three different Enterprise Editions of RapidMiner; the SMALL, STANDART and the DEVELOPER Edition. These certified versions of the open source product supply customers with the needed liability to run it in the companies IT-infrastructure. For these editions the roll-out of the 5th version already took place, too.
Data Mining with RapidMiner
RapidMiner is a complete Data Mining suite. That means, it provides all steps of the KDD process from the interface to the database and the ETL to the analytics and the reporting tool. The tool supports more then dizzying 500 Data Mining methods due to its open source roots. And it has proven reliability in many tests, for example at BARC as the best open source tool. An intuitive and modern graphical user interface gives the experienced data mining expert the opportunity to solve nearly all problems he has to face in practice scenarios. The software works with virtual repositories, so that the data can technically lie anywhere. Meta-data can be accessed at every step of the development. A remarkable feature is the real-time validation. In the design process partial results can be obtained, so the usual trial-and-error approach is noticeable simplified. You find some good video tutorials at the RapidMiner website.
More than Data Mining
RapidMiner has won lots of functionality over the years of development. The clear-cut analytical tool YALE from the past gave way to a modern enterprise tool, that has its own derived solutions for several typical hands-on problems:
- RapidAnalytics: an enterprise server architecture
- RapidDoc: Text Mining and document classification
- RapidSentilyser: market insight (how often and in which context a companies name is mentioned in the media?)
- RapidNet: explorer to discover connections between components in a network
The BuzzBoard is a dashboard for RapidSentilyser allows to track real-rime feedback of publicitiy measures of companies. I saw an impressive example of this clearly arranged results; finance news of the last years were compared to news of the last week – all focused on one multinational company. It was highlighted, if the company was mentioned in “positive” or “negative” coherences. That made it easy to find out if the last publicity activity paid its due with the help of a single measure figure. That is only possible with a high-end Data Mining basement, of course.
Cloud Mining with RapidMiner
The client-server architecture of RapidAnalytics makes it possible to put the repository anywhere, also in the Cloud. That means RapidMiner is in principle ready for the use in the Cloud. But the core advantage of a Cloud Mining solution, the parallelization of the algorithms and the scoring engine, has not been explicit focused. That makes RapidMiner with RapidAnalytics best capable in conventional big company infrastructures.