Skip to main content
Neoinsights
Knowledge Base

Frequently Asked Questions

Answers to common questions about working with Neoinsights - from how engagements work to what technologies are used.

Working With Me

We begin with a free 30-minute discovery call where I learn about your data challenges, current stack, and goals. After that, I put together a short proposal with a suggested approach, timeline, and fixed or time-based pricing - no obligation. Most projects move from first call to kick-off within one to two weeks.
Primarily remote. My clients are distributed across Germany and Europe, and modern collaboration tools make remote delivery seamless for data engineering work. On-site visits to your offices can be arranged when needed - for example for a kick-off workshop, architecture review, or team training session.
I've worked with SaaS companies, energy providers, and enterprise consulting clients. The common thread isn't the industry - it's the challenge: companies that have data scattered across disconnected systems, pipelines that break too often, or analytics that lag too far behind the real world. If that sounds familiar, the industry doesn't matter.
Both. Most engagements are project-based (a defined scope, deliverables, and timeline), but I also work on a monthly retainer for ongoing platform support, optimization, or part-time embedding in an existing data team. We figure out what structure fits your situation during the discovery call.

Technical Capabilities

Languages & Tools: Python, Golang, SQL, dbt, Apache Spark, Spark Structured Streaming, FastAPI Table Formats: Delta Lake, Apache Iceberg Data Platforms: Databricks Cloud: AWS (ECS, EC2, Glue, S3, Athena, Kinesis, Lambda, Redshift) and Azure (Synapse, Databricks, Blob Storage, Functions) Orchestration: Apache Airflow AI/ML: OpenAI, MLflow, Qdrant, Chroma, LangChain, LangGraph, MCP CI/CD: GitHub Actions, Azure DevOps, GitLab CI/CD Containerization: Docker, Kubernetes Infrastructure as Code: Terraform
A data lakehouse combines the low-cost, flexible storage of a data lake with the structured query performance of a data warehouse - all in one architecture. Technologies like Delta Lake (on Databricks) and Apache Iceberg make this possible. It's the right choice if you're currently maintaining both a data lake and a warehouse separately, or if your team is spending too much time moving data between systems. Two of my case studies (SaaS ERP and Energy sector) involved migrating clients from legacy data lakes to Lakehouse architectures - both resulted in 10-25% cost reductions and significantly faster pipelines.
Yes - FinOps (cloud cost optimization) is one of the most consistent wins I deliver. Typical savings range from 20-50% of current cloud spend. The main levers are: switching from full table scans to incremental processing, right-sizing compute clusters (especially Databricks), optimizing storage formats (columnar formats like Parquet/Delta instead of raw CSV/JSON), and identifying idle or overprovisioned resources. In one project, workload tuning and FinOps reduced a client's cloud costs by 25%. In another, incremental processing alone cut daily Databricks compute consumption by 20%.
Yes. My AI focus is agentic AI - systems that automate and simplify real business processes, not demos or one-off models. Think AI agents that handle customer support queues, automate document processing, or surface actionable recommendations from your data without human intervention. I've built a RAG-based AI agent for a SaaS company's support team that achieved 80% user satisfaction by combining LangChain, a vector database (Qdrant), and a well-orchestrated document ingestion pipeline. One important thing to understand: the likelihood of a successful AI implementation is directly tied to the quality of your data foundation. Companies with clean, well-structured pipelines and reliable data platforms see AI projects succeed far more often than those trying to layer AI on top of messy or inconsistent data. If your data isn't ready for AI yet, I can help build that foundation first - and then we move into the AI layer with a much higher chance of real results.

Timelines

It depends on scope. A focused pipeline build or data warehouse setup runs 4-8 weeks. A full platform modernization - migrating from a legacy system to a modern Lakehouse architecture, for example - typically takes 3-6 months. I always define milestones and deliverables upfront so you know exactly what's being built and when. Most clients see measurable improvements (faster pipelines, reduced errors, lower costs) within the first 4-6 weeks.

Germany & Compliance

Yes. Data governance and privacy are built into how I work, not bolted on at the end. This means: data is processed and stored in EU regions by default, access controls and data masking are implemented from the start, pipelines include lineage tracking so you can answer where this data came from, and sensitive fields are encrypted or pseudonymized as required. For clients in regulated industries (energy, finance, healthcare), I'm familiar with the additional requirements and factor them into the architecture design.
Yes. I work fluently in both German and English. All documentation, workshops, code comments, and client communication can be in either language - your preference. My client base is primarily in the DACH region (Germany, Austria, Switzerland) and internationally.

Still have questions?

Book a free 30-minute discovery call and we'll talk through your specific situation.

Book a Free Consult