Welcome to my Data Space

Hi, I'm Nivrutti Kolamkar

& Data Analyst Expert|

Skilled in Python, SQL, Pandas, NumPy, Power BI, and data visualization techniques. Completed a Data Analytics internship and worked on projects involving data cleaning, exploratory data analysis, dashboard development, and predictive modeling. Passionate about transforming raw data into meaningful insights to support business decision-making and continuously learning new technologies in the field of data science and analytics.

98.4%
Model F1-Score Accuracy
5+
Production Models
10M+
Rows Cleaned
15%
Sales Cost Savings
0 Years
Analytics Experience
Who I Am

About & Core Expertise

A statistical breakdown of my problem-solving approach and technical toolkits.

Professional Profile

I am a dedicated Data Analyst and Data Scientist passionate about bridging the gap between raw data sets and strategic business choices. Having completed a professional Data Analytics internship, I have built real-world experience in exploratory data analysis (EDA), data cleaning pipelines, and predictive modeling. With deep expertise in tools like Power BI, SQL, and Python, I excel at constructing highly interactive reports and dashboards that present complex metrics in simple, clear, and actionable narratives.

Data Science & Machine Learning

Python Programming Pandas & NumPy Exploratory Data Analysis (EDA) Predictive Modeling Supervised Learning Statistical Inferences

Analytics & Business Intelligence

Power BI Dashboard Design SQL Database Querying Data Wrangling & Cleaning Interactive Visualizations KPI Tracking & Metric Mapping Excel Data Analytics

Data Workflow & Collaboration

Git Version Control GitHub Portfolio Management Jupyter Notebooks Data Pipeline Concepts Continuous Learning Mindset Analytical Problem Solving
Live Simulator

Machine Learning Playground

Tweak the hyper-parameters and property features in real-time to see how our predictive model estimates house prices on the fly.

Model Input Features

Algorithm: Random Forest Regressor
Living Area (Sq Ft) 2,200 sqft
Bedrooms count 3 beds
Location Premium 1.2x Multiplier
Year Built & Renovation Yr 2010

Predictive Analysis Output

Model State: Active
Estimated House Value
$345,600
+$12,400 from average baseline
Model R²
0.932
RMSE Error
$8,430
Latency
1.4ms
Live Sandbox

Database SQL Sandbox

Recruiters can query a real simulated retail dataset directly inside the browser. Write standard select scripts or click quick-run queries below.

Interactive SQL Console
online_store.db
Output Terminal Query executed successfully in 0ms
BI Visualization

Dynamic Analytics Center

Interactive sales metrics, segment growth indicators, and core marketing metrics displaying real-time data adjustments based on user filters.

Select View
Global Filters

Global Sales Forecasting

Interactive visualization of aggregate regional sales projections computed via ARIMA.

$1.24M
YTD REVENUE
+14.2%
PROFIT FORECAST
Case Studies

End-To-End Data Projects

Explore production-level implementations showcasing pipelines, machine learning, and clean structured outputs.

Python XGBoost SQL

Telecom Customer Churn Pipeline

Designed an end-to-end churn prediction pipeline using XGBoost. Optimized hyper-parameters reducing total churn leakage by 18.2% and integrated a automatic SQL pipeline.

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

# Optimized Churn Model hyper-parameters
params = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [100, 200, 300]
}

xgb_model = xgb.XGBClassifier(use_label_encoder=False)
grid_search = GridSearchCV(xgb_model, params, cv=5, scoring='f1')
grid_search.fit(X_train, y_train)

best_estimator = grid_search.best_estimator_
Python NLP Transformers

BERT Financial Sentiment Scraper

Developed an automated financial news sentiment analyser using HuggingFace RoBERTa models. Scrapes Wall Street columns to generate sentiment signals correlated with price jumps.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")

# Perform financial inference
inputs = tokenizer(["Company revenue surges 25% year-on-year, beating estimates"], 
                   padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
Python PySpark Databricks

Spark Big Data Analytics ETL

Built a production-grade ETL architecture in PySpark streaming 1.5M transactions hourly. Configured dynamic delta lake schemas reducing query latencies for operations analysts.

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, window

spark = SparkSession.builder.appName("TransETL").getOrCreate()

# Structured streaming from Kafka
stream_df = spark.readStream.format("kafka").load()

cleaned_stream = stream_df.selectExpr("CAST(value AS STRING)") \
    .groupBy(window(col("timestamp"), "1 hour")) \
    .count()

cleaned_stream.writeStream.format("delta") \
    .option("checkpointLocation", "/delta/checkpoints") \
    .start("/delta/transactions")