×

Workshops: Introduction to Apache Hadoop

CLOSED

Synopsys:

The apache hadoop is the most widely used ecosystem for big data processing and data science, its growth in popularity is boundless. Hadoop is an open-source Apache software foundation project written in java that enables the distributed processing of large datasets across clusters of commodity. Hadoop has two primary components, namely, HDFS and MapReduce programming framework.

×

Workshop: Introduction to Sentiment Analysis using R Programming

CLOSED

Synopsis will be provided soon.

×

Workshop: Introduction to Interactive Data Visualization for the Web using D3

CLOSED

Synopsys:

This workshop is about programming data visualizations for nonprogrammers. If you’re an artist or graphic designer with visual skills but no prior experience working with data or code, or if you’re a journalist or researcher with lots of data but no prior experience working with visuals or code, this workshop might be an interesting start.

×

Industrial Talks by Petronas

CLOSED

Title:

Big Data Analytics, Techniques, Architectures & Machine Learning

Speaker profile:

Jagathesan Balakrishnan is the Manager of DISA (Digital Innovation Strategy and Architecture) division at PETRONAS. His role includes working cross-functionally across groups within the organisation to proliferate applied Data Science knowledge to harness the capabilities of big data, machine learning, AI and to discover and deliver insights at scale to help drive operational efficiency. He has more than 20 years of IT related software development experience with 10 years of Data Analytics specific experience. Prior to joining PETRONAS in 2015, he has worked with numerous companies including Intel Technology (Penang, Malaysia). His specialization includes Business Intelligence, Data Mining, Machine Learning, AI algorithms, Software Development, Python, R and etc. Jagathesan has an MSc in Information Technology from The University of Nottingham.

×

Industrial Talks by Teradata

CLOSED

Title:

Hadoop = Big Data? Unleashing the Optimal Value of Data with Unified Data Architecture

Synopsis:

Organizations need a data-driven approach to running their business. The opportunity for organizations to benefit from big data and new analytic tools is greater than ever before, but so are the challenges to put new insights into action. It is difficult for organizations to understand the role and utilize all the new technologies emerging in the marketplace (e.g., Hadoop, MapReduce, Spark). Technologies and available data sources are changing so quickly that businesses need assistance in architecting a solution to integrate data and more importantly, to integrate analytical insights. In this presentation, we are going to discuss some common misconception surrounding Big Data Technologies. A series of data architecture will also be discussed in order to help APU understand how organization can use the proposed data architectures to apply the right technology to the right analytical opportunities.

×

Industrial Talks by ChainSpirit Sdn. Bhd.

CLOSED

Title:

How Data Affecting to Buss World Today

Synopsis will be provided soon.

×

Workshop: Machine Learning Fundamentals, Building Models and Applying Them to Real Data

CLOSED

The rubber plantation was started in Malaysia in 1877. It is the leading producer of natural rubber in the world. Being a leader in the production of natural rubber, the country is contributing around 46% of total rubber production in the world. The favorable rubber plantation climate requires a mean temperature of 27°C, never falling below 22°C. It also requires heavy rainfall above 200 cm. with no drought including deep rich soils with good drainage preferably brittle, well-oxidized and acidic in reaction. Sufficient supply of labour is an important factor for the collection and plantation of rubber over large holdings. Here, rubber can grow anywhere, because of the suitability of climate and top soil; but most of the rubber estates are located in the western coastal plains of Malaysia. The plantation in coastal zone gets the benefit of nearest port for its export. Yet very low areas are avoided in order not to suffer from stagnation of water. The greatest production is in its Johor State of Southern Malaysia. Over here the rubber cultivation occupies about 4-2 million acres or about 66% of the total cultivated area in the nation.

The objective of this study was to determine the factors which contribute to accurately predicting high rubber yield per kg based on historical rubber plantation data. The data comes from the Department of Statistics, Malaysia. We found the predictors ProduceTonne is most significant for prediction of the response variable, YieldperHectKg and is closely followed by other predictors, TotalPaidEmployee and TapAreaHect. We also see that Regression tree based approach give 98% accuracy in predicting the response variable while Random Forest model (0.049%) does not even come close. The reason we achieved such a high predictive accuracy for regression tree based model was because there was a strong positive linear relationship between the predictors and this works best for regression tree accuracy. The random forest algorithm would have served a response variable with finite set of values better.

A hat tip for beginners in data science is to look at the response variable in deciding which algorithm to use. In this case study, the response variable was continuous in nature with strong positive linear relationship among the predictors. Therefore, the choice of regression trees was ideal.

×

Pushing the Limits of Data Scientist’s Toolbox for Clinical Data Analytics

CLOSED

Synopsis:

Data analytics can be, and has been, widely used in the healthcare sector and services. With many institutions adopting Electronic Health Records, enormous amounts of data are becoming available that, if mined correctly, can reveal hidden knowledge and provide useful insights to many stakeholders. Medical data mining can tackle issues such as identification of high risk patients, prediction of treatment outcome, early diagnosis, etc. In this presentation we address the special challenges that projects addressing clinical data mining objectives would face that are not classically faced in other types of data analytics projects. We present some findings from our work on the project "Understanding Cancer Therapy Effectiveness: Personalized Treatment Guidelines using Data Analytics", and propose methods and techniques that can help to improve the outcome of available tools and algorithms for better usable findings that can be adopted in the healthcare field.

Speaker #1:

Dr. Asem Kasem obtained his Doctor of Computer Science degree from University of Tsukuba, Japan, in 2011. He has since worked in different higher education institutions in South East Asia and the Middle East. He is currently an Assistant Professor in Universiti Teknologi Brunei (UTB), and his main area of expertise is in Intelligent Systems and Data Mining. Among his recent projects is the work on understand cancer therapy effectiveness using data analytics, as an FRGS project collaborating APU, UTB, Tokushima University, and Malaysia’s Institute for Medical Research.

Speaker #2:

Hema Latha Krishna Nair is Senior Lecturer of Faculty of Information Technology & Science INTI International University, who has been working in education since 2008. In the past she had been leading and teaching modules such as Database programming, Artificial Intelligence Methods and Knowledge Discovery and Data Analytics. During 2002-2008, she was working as Software Engineer and served at Panasonic AVC (M) Sdn Bhd under the Research and Development department working on embedded programing and actively involved in Software Quality Assurance for CCMI Level 3. She achieved her Bachelor’s Degree in Computer Science from Universiti Sains Malaysia in 2002 and further pursued her Master’s Degree in Information technology Management from Universiti Teknologi Malaysia in 2004 submitting industrial prototype dissertation on Data Analytics for a Manufacturing company and currently pursuing her Phd in Software Engineering at Universiti Teknologi Malaysia. Hema Latha was actively involved in research on Flood Mitigation and Water Channeling and has been awarded the Project Lead for the Ministry of Education Prototype Research Grant in 2014 for special Grant under Malaysian Disaster Management. Currently she is the co-researcher of Fundamental Research Grant from Ministry of Higher Education for Cancer Data Analytics Malaysia. She is an active member of Big Data Malaysia and Persatuan Data Sains Malaysia.

×

Fuzzy Relationship Visualization in Bibliographic Big Data Recommendation System

CLOSED

Speaker profile

Maslina Zolkepli obtained her Doctor of Engineering degree in Computational Intelligence from Tokyo Institute of Technology, Japan in 2015. Currently she is a senior lecturer at the Department of Computer Science and a member of the Intelligent Computing Research Group at the Faculty of Computer Science and Information Technology, Universiti Putra Malaysia. She Her research interests are in the areas of Computational Intelligence, specifically on fuzzy systems and data science. Among her recent projects are the bibliographic big data visualization approach and fuzzy analytic hierarchy process in natural disaster forecasting. She is also currently engaged in a consultation project involving the design of the fuzzy aggregation based data analytics for security threat profiling with Malaysia’s cyber threat agency.

Synopsis:

Fuzzy relationship visualization is proposed for Bibliographic Big Data Recommendation System by combining Newman-Girvan and fuzzy c-means algorithm to find fuzzy relationship among bibliographic big data. The combination algorithm and visualization of the search result is able to help users to converge a few target papers from more than 1.5 million papers in DBLP. An automatic switch is also introduced by utilizing a fuzzy inference engine to select the fastes performing algorithm. The recommendation system has a potential to become a helpful tool in academic research for those who are interested to find expert network in specific areas.

×

Data Science Postgraduate Colloquium

This is used as an opportunity to allow the Data Science postgraduate students to come together and speak about their work and interact socially, rather than just in an academic setting.

×

The Mind Map of a Data Scientist

CLOSED

Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, massage and organize them.

×

What can "Small Data" Scientist Bring on Their "Big Data" Journey?

CLOSED

Synopsis will be provided soon.

×

Workshop: Seasonal ARIMA Model

CLOSED

Synopsys:

ARIMA (AutoRegressive Integrated Moving Average) models are a class of linear models that is capable of representing stationary as well as non-stationary time series. This methodology of forecasting uses an interactive approach of identifying a possible model from a general class of models. The ARIMA model incorporating seasonal characteristics is referred to as seasonal ARIMA model. Thus, in this workshop an overview and application of seasonal ARIMA will be discussed.

×

Workshop: Operational Research: Linear Programming & Network Models

CLOSED

Linear programming is a problem-solving approach that has been developed to help us to make decisions. It has been used extensively in business, economics and engineering. An example of an engineering application would be maximising profit in a factory that manufactures a number of different products from the same raw material using the same resources. The constraints would be decided by the amounts of raw materials available. In the field of business and management, linear programming is a method for solving complex problems in the two main areas of product mix (where the technique may be used to decide how much of each variable to use in order to satisfy certain criteria such as maximising profits or minimising costs, subject to certain constraints) and distribution of goods. As variety of application suggests, linear programming is a flexible problem-solving tool and has proven to be one of the most successful quantitative approaches to managerial decision making.

Areas such as transportation design, information designs and network design have been successfully solved with the aid of network analysis techniques. All these problems have their mathematical structures that enable us to develop efficient solution procedures for solving them. In this workshop, some techniques will be shown in order to understand how a network model can be developed and provide an optimal solution to the problem.

×

Workshop: Forensic Data Analysis


Forestpin runs statistical algorithms on structured data to display a visual dashboard of various test results. These tests are based on hardcoded algorithms in the system which can be tweaked to display anomalies and potential dubious transactions. Forestpin gives users the power to drill down the results based on each of the dashboard results until it reaches the transactions that causes the anomalies. These transactions would give an insight to investigators on potential fraudulent transactions, data manipulation or even regular mistakes during day-to-day operations. Other than being used as an investigative tool to uncover fraudulent transactions and potential misconduct, Forestpin can also be used to unlock business opportunities. By uploading transactional data of a business, Forestpin would be able to display potential areas which businesses could either exploit or even refrain from engaging. In a retail business, Forestpin would be able to show products which are being sold most at a certain point during the year and products which do not move as fast. This would allow the management to stock up on the right items to improve profitability and deter losses. For other business processes like logistics, Forestpin will be able to reduce number of trips to similar area by analysing transportation cost and volume of goods transported. With this, stock keeping and warehousing can be done easily.

 
apca@apu.edu.my


Venue Map
 
Parking
 
Direction
 

30 September 2017 (SAT)

Workshops

3 October 2017 (TUE)

Industrial Talks Workshops

4 October 2017 (WED)

Research Presentations

Sharing experiences, Shared Learning

7 October 2017 (SAT)

Workshops