Big data is recently defined (by Gartner) as high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. In this talk, we first present key challenges in Big data programming, that are distinct from conventional parallel processing. After that, we introduce several research projects dealing with large volume of data in the data mining lab at POSTECH, that are, PubMed relevance feedback search engine, blackbox video search, novel recommendation, and timing when to recommend.
Hwanjo Yu received his PhD in Computer Science at the University of Illinois at Urbana-Champaign at June 2004 under the supervision of Prof. Jiawei Han. From July 2004 to January 2008, he had been an assistant professor at the University of Iowa. He is now an associate professor at POSTECH (Pohang University of Science and Technology). He developed influential algorithms and systems in the areas of data mining, database, and machine learning, including (1) algorithms for classifying without negative examples (PEBL, SVMC), (2) privacy-preserving SVM algorithms, (3) SVM-JAVA : an educational java open source for SVM, (4) RefMed : relevance feedback search engine for PubMed, (5) TurboGraph : a fast parallel graph engine handling billion-scale graphs in a single PC. His methods and algorithms were published in prestigious journals and conferences including ACM SIGMOD, ACM SIGKDD, IEEE ICDE, IEEE ICDM, ACM CIKM, etc., where he is also serving as a program committee.