About Me
Hello! I'm Saman Siadati, a qualified data scientist. With a background in ML/AI/big data projects, I enjoy building meaningful projects and sharing knowledge through my work.
Skills
- Programming Languages: Python, C++
- ML/AI: Scikit-Learn, Tensorflow, PyTorch, MLOps
- DataBase Management: MySQL, MS SQL server, PostGreSQL
- DataBase Management(Unstructured): MongoDB
- Big Data Language/Platforms: PySpark, DataBricks, TeraData
- Data Analysis: Pandas, NumPy, Matplotlib, Seaborn, Plotly, Splunk
- Data PipeLines: AWS Glue, AWS EventBridge, Azure Synapse
- Dependency Management: Poetry
- Container: Docker
- BI/Visualization: Tableau, PowerBI
- MicroServices: FastAPI, AWS lambda, AWS API Gateway, AWS S3 buckets
- Cloud Platforms: AWS, MS Azure, GCP
- Version Control: Git & GitHub, Gitlab, Bitbucket, Azure DevOps
- Soft Skills: Statistical Thinking, Problem-solving, Teamwork
Experiences
AI Trainer
Outlier (May 2024 - Present) part-timeEvaluates and enhances AI-generated mathematical content for accuracy, clarity, and relevance.
LLM Engineer
AAK TELE-SCIENCE (Jul 2021 - Present) part-timeRecommendation System to choose best investor for the best combination of research group and research subjects / Questions Answering system using Google TAPAS LLM model for table-based data
Data Scientist / Data Engineer
Football Australia (Dec 2023 - Present)Created several data pipelines for data ingestion, cleansing, manipulation, and transformation; created API endpoints, big data analytics, and data visualization with Tableau.
Data Scientist / Data Engineer
7-Eleven (Jan 2023 - Dec 2023)Created a risk management system with populate data files into tables in bronze, silver and gold layers to extract the insight
AI/ML Engineer
DataGamz (Jun 2022 - Jan 2023)Created a speech recognition and sentiment analysis tool with tensorflow, fastapi
Data Engineer / Data Scientist
Accenture (March 2022 - Jun 2022)Created a containerized environment with OpenShift and FastAPI to extract data from Splunk to push into SFTP server to be used in PowerBI to extract insights
ML Engineer
ANZ (Oct 2021 - March 2022)Generated the maximum insights from Bank real transaction data, such as the best merchant, the best state, best city and suburb, most valuable transactions, best weekday and best daytime, with exploratory data analysis, data cleansing, and feature engineering and geo-spatial analysis
AI/ML Engineer
SamurAIBI (Jul 2021 - Dec 2021) part-timeCreated an automation for mask detection in pictures and videos using Tensorflow and OpenCV/
ML Engineer
Kadree (Jan 2021 - Jul 2022) part-timeCreated predictive modeling and automated alarm mechanism for older-aged people to mitigate the risk of hospitalization and death with anomaly detection
Data Scientist
isgood (March 2020 - Dec 2020) part-timeWeb scrapping for Gender Equity Victoria, women’s health and the prevention of violence against women with Python Beautiful soup and scrappy
Data Scientist
SirLab (Jan 2011 - Dec 2019)Text Analysis for extracting the most valuable insights from students and staff comments from the questionnaire about using the remote technologies, with n-grams, word-cloud/ Sentiment Analysis on open-questions comments in student’s survey in Python with textBlob/ Exploratory Data Analysis (EDA) and visualisation for student’s enrolment in university departments with seaborn, matplotlib and plotly/ Big data analysis with Google BigQuery.
Projects
Question Answering tool with LLM model
This project focuses on building an intelligent Question Answering (QA) tool that leverages Large Language Models (LLMs) and Google TAPAS, a state-of-the-art transformer model specifically designed for structured data such as tables. The tool aims to provide accurate, contextual, and insightful answers to user queries based on tabular datasets. This project combines the strengths of LLMs and TAPAS to provide robust QA capabilities tailored to structured data formats, bridging the gap between natural language queries and complex data tables.
Recommendation System for investement in research projects
This project involves the development of a recommendation system designed to identify the best investors for optimal combinations of research groups and research subjects. The system leverages deep learning models built using TensorFlow to analyze and predict the most promising matches between investors and research initiatives. A PostgreSQL database serves as the backbone for efficient storage and retrieval of data related to research groups, subjects, and investor profiles. FastAPI is used to build a high-performance RESTful API, enabling seamless interaction between the recommendation engine and user-facing applications. The project aims to streamline investment decisions and foster innovation by aligning investors with research opportunities that best match their strategic interests.
Facial Expression Analysis tool
This project focuses on facial expression analysis using TensorFlow for deep learning and FastAPI for building a high-performance API. A convolutional neural network (CNN) model is trained on image datasets to detect and classify facial emotions such as happiness, sadness, anger, and surprise. The model processes input images and predicts emotional states with high accuracy. FastAPI is used to create a robust and scalable RESTful API, allowing users to upload facial images and receive real-time emotion predictions. The application is designed for integration into interactive systems, enhancing user experience in fields like marketing, gaming, and healthcare. The combination of TensorFlow and FastAPI ensures a powerful and efficient solution for emotion recognition tasks.
Mask Detection tool
This project focuses on developing an automation system for detecting face masks in images and videos using TensorFlow and OpenCV. The system leverages computer vision techniques to identify whether individuals are wearing masks correctly or not. TensorFlow was employed for building and training a deep learning model, while OpenCV was utilized for real-time image processing and video frame analysis. The solution is capable of detecting multiple faces simultaneously and classifying them based on mask presence. This automation can be integrated into surveillance systems for public safety and compliance monitoring. The project demonstrates the power of AI-driven tools in enhancing health and safety protocols.
Service Price Recommendation system
This project aims to develop a Best Price Recommendation System for computer services in Australia. The system uses Scrapy to efficiently scrape and aggregate pricing and service data from major market and sales companies. By continuously gathering real-time data, it captures market trends and pricing dynamics. The collected data is then processed and analyzed using TensorFlow to predict the best service price for the next day. The predictive model leverages machine learning algorithms to identify patterns and fluctuations in service costs. This solution empowers businesses and consumers to make informed decisions, optimizing their service expenses.
Old aged-care hospitalisation alarm system
The project involved analyzing time-series data generated by sensors installed in home appliances to detect anomalies in daily behavior patterns. By converting device signals into 24-character strings representing hourly activity, I created a standardized model to quantify and assess each day's quality. Leveraging frequency metrics and the Levenshtein distance, I established a system for identifying abnormal days without the need for manual monitoring. This solution enables proactive rather than reactive healthcare interventions, helping reduce hospitalization costs and risks. Currently, I am optimizing predictive models using Keras and TensorFlow, focusing on enhancing activation functions and dropout strategies for better anomaly detection.
Breast Cancer Prediction
This project focuses on predicting breast cancer by analyzing features computed from digitized images of fine needle aspirate (FNA) samples of breast masses. The study utilized the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, which contains 569 samples (357 benign, 212 malignant). Each sample is characterized by 30 features computed from the cell nuclei in the images. The optimized SVM significantly improved diagnostic accuracy, reducing false negatives from 11 to 2.
Research & Publications
The First Step in the Computational Optimization, Mathematical programming
TowardsAI, Jan 2021
Read MoreGenetic algorithm, Deep dive into natural selection method
Artificial Intelligence in Plain English, Feb 2021
Read MoreCertificates
Databricks Large Language Models: Foundation Models from the Ground Up
Apr 2024, edX
Databricks Large Language Models: Application through Production
Oct 2023, edX
Databricks Lakehouse Fundamentals
Aug 2022, Databricks Academy
Oracle Machine Learning using Autonomous Database Specialist
Jan 2022, Oracle University
Certified Artificial Intelligence Professional
Oct 2021, Australian Institute of ICT
Statistical Thinking and Problem Solving
Jun 2020, SAS
Google Cloud Platform Big Data and Machine Learning Fundamentals
Apr 2020, Google
Advanced Data Science Specialist
Sep 2019, IBM
Scalable Machine Learning on Big Data using Apache Spark
Sep 2019, IBM
Deep Learning & Neural Networks with Keras
Sep 2019, IBM
Contact
If you'd like to get in touch, feel free to reach out via email at samansiadati@gmail.com.