Vibe coding in practice: an honest experiment in building an app using AI

Vibe coding is a new term for creating software using AI. The term was introduced by Andrej Karpathy, a founding member of OpenAI and former Director of AI at Tesla. It describes a workflow where developers express tasks in natural language, and AI tools generate code, tests, and supporting components.

What started as a wave of interest around AI-assisted coding quickly turned into daily practice. Few expected the term itself to gain this level of traction. Yet in 2025, “vibe coding” was named word of the year by Collins Dictionary.

Adoption continues to grow. More teams include AI tools in their daily workflow. However, opinions still differ.

Some developers treat AI as a practical assistant. Others question the reliability of generated output. People choose different sides. At the same time, AI adoption in software development is not a choice teams can delay for long. The market has already moved forward.

Usage is widespread and measurable: more than 90% of development teams use AI tools in their workflow, which resulted in ~6 hours saved per developer each week by reducing repetitive tasks, according to McKinsey.

At Aristek, curiosity about emerging technology has always been a working principle. Since capable AI coding tools became available, our engineers have been testing them across different project types, team configurations, and technical environments, accumulating practical knowledge of where the tools perform reliably and where they require careful management.

AI is already embedded in how we work at Aristek. We have a defined policy for its use across the software development lifecycle, and our engineers apply it regularly on real projects.

Within that context, one engineer decided to take the experiment further: to test whether AI, given the right instructions and a human guiding the process, could produce a fully functioning application from start to finish. The goal was to find out exactly what that looks like in practice, and where the limits are.

Eugene, a software engineer at Aristek, spent a weekend building a real application using GitHub Copilot as the primary agent. The process wasn’t smooth, but still, the result worked. The final version included a working frontend, backend services, automated tests, and a generated design. And 99% of the code was produced by AI.

How much time does it actually save? What works as expected, and what breaks under pressure?

This article documents the full experiment in detail. Each stage also includes expert comments from Aristek engineers and specialists who work with these tools on live projects, so the conclusions reflect more than one session on a weekend.

We didn’t plan this. The industry did

AI did not enter software development quietly. It showed up, gained attention fast, and became part of daily work before most teams had time to form a clear position. We have seen this pattern before. jQuery, frameworks, and Agile methods followed a similar path. At first, they felt optional. Then they became expected.

Today, the same shift is happening with AI. Most teams already use AI tools in some part of the development process. The difference is usually not whether AI is used, but how deeply it is included in daily work. Some engineers rely on it heavily. Others use it selectively and review the output more cautiously. At the same time, the pace of delivery continues to increase.

Recent data reflects this shift:

Up to 55% faster task completion when developers use AI coding tools (GitHub)
39% increase in time spent in focused work when using AI-assisted tools (Microsoft)
3–5× productivity gains reported in some cases, depending on the task and setup (Docker)

These results do not apply evenly to every task. But they do show a clear direction: teams that include AI in their workflow reduce time spent on routine steps and move faster through implementation.

This changes how developers approach their work. AI does not replace engineers. It changes the way they write, test, and structure code. Those who adapt to this workflow gain an advantage in speed and output.

This context led to the experiment described in this article. The goal was simple. Take a real task and work through it with AI as part of the process. Observe how it behaves across different stages. Note where it helps, where it slows things down, and how much guidance it requires.

To make the process clear, we structured it around five stages of the development lifecycle: business analysis, design, development, testing, and deployment. Each stage reflects a typical step in real project work and shows how AI fits into it.

Business analysis stage: where the first decision is already delegated

I didn’t even come up with the project idea myself. I asked AI to suggest one, picked from the list, and went from there.

Eugene IvanovSoftware Engineer at Aristek

Before a single file existed, before a framework was chosen, before any technical decision was made, the project concept itself came from a conversation with ChatGPT.

Eugene opened ChatGPT and asked a simple question: suggest a few ideas for a demo project. The model returned three options. He selected one and asked a follow-up: describe how this product should work.

The response outlined user flows, basic functionality, and system behavior. Eugene saved it as a draft file. That document became the starting point for everything that followed.

From a rough idea to a structured brief (without a single meeting)

There was no detailed specification at this stage. No predefined architecture. Just a direction, and AI helped turn that direction into a structured description.

The model asked clarifying questions along the way: which backend framework, which frontend approach, what scale to plan for. Eugene answered, the AI incorporated the answers, and the result was a coherent project brief – not ideal, but with enough structure to begin implementation. The whole process took one conversation.

This matters because writing a proper brief is usually slow work. It requires aligning stakeholders, translating business intent into technical language, and resolving ambiguity before development begins. AI does not replace that process entirely, but it shortens the distance between “we have an idea” and “we have something to build from.”

Speed changes the scope of early analysis

This approach affects how early-stage analysis is done.

Tasks that usually take days can be reduced to a shorter cycle. Competitor research, for example, can be outlined in one session. AI can summarize existing products, highlight common patterns, and suggest positioning. This does not replace direct research, but it provides a starting point.

The same applies to technical decisions. Given enough context, models can suggest architecture options and technology stacks. These suggestions are based on patterns present in training data, not on project-specific knowledge. The quality of output depends on the clarity of input.

Where AI needs direction

AI can structure information, but it does not define priorities on its own.

If the input is vague, the output will be generic. If key constraints are missing, the model fills the gaps with assumptions. In practice, this leads to plausible but incorrect decisions.

Eugene’s role at this stage was to guide the process. He did not write the document from scratch, but he controlled the direction through answers and clarifications.

Verdict

AI works well in early analysis when the goal is to move from idea to structure.

It reduces the time needed to produce a working draft. It helps identify gaps and prompts the right questions. At the same time, it depends on human input to stay relevant.

In this experiment, the business analysis phase did not require formal documentation or extended planning activities that usually accompany production projects. A short interaction was enough to define the project at a level suitable for the next stage.

In real delivery environments, this phase includes deeper requirements analysis, stakeholder alignment, prioritization, risk review, and technical discussions. AI works well here as a support tool. It speeds up the creation of initial requirements, drafts user flows, and structures information quickly, but human review remains necessary before implementation starts.

AI has changed business analysis much more broadly than many people expect. Today, it supports almost every stage of my work, from discovery and research to ticket creation, prototyping, validation, and even testing support.

In practice, I mostly work with tools like Claude Code, GitHub Copilot, Codex, and MCP-based workflows connected to systems such as Jira or design platforms. For quick prototyping, tools like Lovable are also useful. The exact setup depends on the project and the client environment.

The biggest change is systemic. AI helps process incoming information, structure requirements, prepare drafts, generate tickets, and organize artifacts according to predefined formats. This removes a large amount of repetitive work and gives analysts more time for actual analysis and decision-making.

At the same time, human validation remains essential. AI performs well when there is enough context and when project documentation is clear. If context is missing, the model starts filling gaps with assumptions. That is one of the main risks in analysis work. AI is designed to provide answers, even when information is incomplete.

We also learned that automation works best when instructions are treated seriously. The quality of the result depends heavily on how clearly workflows, constraints, and expected behavior are described. Skills, prompts, and agent instructions already play an important role in how consistently these systems perform.

AI should not be treated as autonomous. It is a powerful assistant, but analysts still remain responsible for validation, accuracy, and final decisions. Right now, the most practical approach is simple: automate the tasks that consume time, observe where AI performs well, and keep human review where it matters most.

Ann DanilkovichBusiness Analyst at Aristek

Design stage: 3 hours to set up, 10 minutes to build

The gap between what an agent can do and what it does is usually a gap in instructions, not a gap in capability.

Eugene IvanovSoftware Engineer at Aristek

With a project brief in hand and a clear implementation plan generated from it, the next step was design. This is the stage where most engineers would hand work to a designer and wait. Eugene decided to see how far an AI agent could take it.

The short answer: further than expected, but not without significant setup.

“It knew exactly what to do. It just couldn’t see the result.”

The agent had access to Figma through an MCP integration and could create components, set colors, define spacing, and build layouts. Technically, it understood design. The problem was that it had no way to verify its own output visually.

It placed a button, wrote text inside it, and had no mechanism to check whether the text actually fit. It set padding values that looked correct in the API response and were wrong on screen. The first iterations had overlapping elements, misaligned components, and text spilling outside its containers.

The fix was straightforward once identified. Eugene added an explicit instruction to the agent: after placing any element, query its dimensions through the Figma API and confirm they match the intended values. The agent already had access to this information. It simply had not been told to use it. Once that validation step was part of the instructions, the output quality improved substantially.

This is a recurring pattern in AI-assisted work. The model does not automatically apply every capability it has. It applies what it is told to apply. The gap between what an agent can do and what it does is usually a gap in instructions, not a gap in capability.

Design colors and components offered by AI

The tooling situation is honest about where it stands

Setting up the design pipeline required navigating a fragmented tool ecosystem. Figma’s official MCP offered approximately five read-only actions at the time of the experiment. It was sufficient for reading an existing design and not useful for creating one.

Eugene switched to an open-source alternative with 93 available actions, which worked but crashed unpredictably and required a specific startup sequence each time.

This is the current state of MCP integrations across most design tools:

Official integrations from major vendors are conservative, stable, and limited in scope
Open-source alternatives offer broader functionality but vary significantly in reliability
Crashes and connection failures are common and require manual recovery steps
Tool behavior can differ depending on which agent system calls them
Setup time is real and should be budgeted separately from actual design time

None of this makes the approach unworkable. It does mean that the first use takes longer than subsequent ones, and that someone on the team needs to understand the integration well enough to debug it when it fails.

What came out at the end

The setup took two to three hours. Once the pipeline was stable, the agent produced a complete design system in approximately ten minutes: components, design tokens, color palette, spacing rules, and page layouts.

The result was not polished to a professional standard, but it was coherent, consistent, and ready to hand to a frontend implementation agent.

A senior designer working on the same scope would typically spend several days, sometimes a full week, depending on the complexity of the component library. The agent produced a working baseline in an afternoon, most of which Eugene spent configuring the tools rather than directing the design work itself.

Beyond this specific project, AI at the design stage can produce useful output in several areas:

Generating initial design systems with tokens, typography scales, and color palettes from a brief description
Producing multiple layout variations for the same component quickly
Translating design specifications into structured data for frontend implementation
Checking dimension consistency and spacing rules across components
Documenting design decisions in a format that can be referenced by development agents

One decision worth making early

Agent instructions for design tasks can be saved and reused. The Figma connection sequence, the validation steps, the dimension-checking rules, and the overlap prevention logic that Eugene built during this stage now exist as a reusable configuration.

The next project that needs a design agent does not start from zero. It starts from a working baseline and adjusts for specific requirements.

The cost of building a reliable design agent is paid once. After that, it is shared.

Verdict

The design stage required the most setup effort. Around two to three hours were spent configuring and stabilizing the workflow.

Once configured, the agent produced usable design output quickly. The final result was visually consistent and ready for implementation. The quality was closer to the work of a mid-level product designer preparing an internal MVP or early prototype than to a polished senior-level design prepared for a mature commercial product.

The layouts, spacing, and component structure were coherent enough for development work to continue without major redesign. At the same time, the result still lacked the detail, refinement, and product thinking that experienced designers usually add during later review stages.

The key takeaway is practical. AI can generate design structure at speed. Quality depends on validation rules and tool stability. Without both, errors accumulate even when the process appears correct.

AI is genuinely part of the design process now.

In my work, that looks like using ChatGPT for the structural groundwork: grouping interview insights, finding patterns in feedback, turning scattered notes into something that can actually be acted on. For prototyping, I rely on Lovable, Figma Make, and Claude’s tools when I need to test product logic rather than just visuals. Putting a working flow in front of a client before any frontend work starts catches problems that would otherwise surface much later and cost more to fix.

That said, the delegation line matters. Preparatory, well-scoped tasks go to AI: rough ideation, microcopy options, usability testing scenarios. Strategic decisions, design architecture, and anything requiring real understanding of business context or specific user behavior stay with a specialist. AI has no accountability for the outcome and no visibility into the project’s actual constraints.

What I think is worth paying attention to, though, is a subtler risk than the obvious ones. AI does not produce wrong output as often as it produces average output, confidently. It was trained on existing patterns, so it defaults to standard solutions.

That is a reasonable starting point. It becomes a problem when those suggestions move into a product without critical review, because they tend to flatten exactly what makes a product distinct.

In the end, what has actually changed in my day-to-day is where the effort goes: less time on initial structuring, more time on evaluating what comes back. The skill that matters most now is being able to tell a strong idea from one that is simply well-formulated. AI does not make that judgment. That still sits with the designer.

Natalia LavrinchikProduct Designer at Aristek

Development stage: Can AI handle a full stack on its own?

I wasn’t watching it work most of the time. I’d write the instruction, press enter, and go do something else. The code was there when I came back.

Eugene IvanovSoftware Engineer at Aristek

After the design stage, the process moved into development. This was the most illustrative part of the experiment – not because the output was perfect, but because of how much was produced with limited direct input.

Starting from the draft document and staged instructions, GitHub Copilot generated a full-stack application.

The final codebase included a Next.js frontend, a Node/Express API gateway, four backend microservices structured as a monorepo, shared component libraries, common TypeScript interfaces, and shared contracts between services.

The frontend read design tokens directly from Figma through the MCP connection and implemented the

UI without Eugene specifying a single color value or padding rule manually.

Backend services were built in six and nine minutes respectively, running in parallel through Copilot’s FleetMod execution mode. Total agent-running time across all backend services: under two hours, and our developer was present for a fraction of that time.

Instructions define the system

The structure of the project depended on how instructions were organized.

Each service had its own instruction file with local context. A root instruction described the overall system. This separation reduced conflicts that appear when one agent tries to manage the entire codebase in a single context.

Without this separation, inconsistencies appeared early:

Different package managers used across services
Conflicting port configurations
Variations in file structure and conventions

Adjusting the instruction structure resolved most of these issues at the source.

Parallel work instead of sequence

Development did not follow a typical step-by-step process.

Using Copilot’s parallel execution mode, multiple parts of the system were generated at the same time. Backend services were created in minutes. The frontend consumed design data directly from Figma and implemented the interface without manual specification of styles.

This changed the pace of development:

Backend services generated in 6–9 minutes each
Frontend implemented using design tokens and references
Shared contracts created alongside services, not after

What would normally take sequential effort was handled as parallel tasks.

Mistakes are part of the process, not a sign it is failing

The agent produced working code, but not clean code on the first pass.

Typical issues included:

Mismatched dependencies
Incorrect service configuration
Deviations from defined architecture
Incomplete or inconsistent file structures

The response was not to fix each issue manually. Instead, Eugene updated the instructions and restarted the stage. This approach produced consistent results across iterations.

Traditional code review was not practical at this scale. Instead of reading 40,000 generated files line by line, the focus moved to running the application, verifying expected behavior, identifying visible issues, and requesting targeted fixes from the agent. This shifts part of quality control from manual inspection to execution and validation.

Where AI is effective in development

AI performs well when tasks are clearly defined and repeatable. Common strengths include:

Generating standard application structure
Creating boilerplate and service scaffolding
Implementing APIs based on defined contracts
Connecting frontend and backend through shared types
Following consistent patterns across multiple services

Where control is still required

AI does not maintain consistency without guidance. Human input is required for:

Defining architecture and boundaries
Enforcing standards across services
Resolving conflicts between generated components
Deciding when to restart versus patch existing output

Without this control, small inconsistencies accumulate into larger issues.

Verdict

The development stage demonstrated how quickly AI can generate large amounts of application code when the project structure, architecture, and instructions are already defined. The full backend was generated in under two hours of runtime, but the process still depended on continuous human oversight.

Eugene reviewed outputs, adjusted instructions, restarted stages when inconsistencies appeared, and verified whether the generated system behaved as expected.

The result was a working system with multiple services and a connected frontend. At the same time, the experiment also showed that generation speed alone is not enough. AI handled implementation tasks efficiently, but system consistency depended on architecture decisions, instruction quality, validation mechanisms, and human review throughout the process.

QA & testing stage: 42 tests written. None of them by a human.

I asked for positive flow, negative flow, and edge cases. It wrote 42 tests. And somewhere in there, it decided to try XSS injection on its own. I didn’t ask for that.

Eugene IvanovSoftware Engineer at Aristek

After development, the process moved into testing. Eugene gave the agent a single instruction: write end-to-end tests covering positive flow, negative flow, and edge cases. The agent produced 42 Playwright tests running across five parallel workers.

It launched a real browser, navigated the live application, and identified UI locators dynamically by inspecting the running interface rather than reading the source code. The tests ran and passed.
The biggest surprise was edge cases. The agent included XSS injection attempts and tests with extremely long input strings without being asked to. Given only the phrase “edge cases,” it inferred that security-relevant inputs belonged in scope.

Whether the specific tests it wrote represent thorough security coverage is a separate question, but the behavior itself reflects something useful: when given a broad instruction and enough context about the application, the agent applies general knowledge about what good testing looks like rather than taking the minimum interpretation of the request.

Hooks: the part of QA that runs before a mistake can travel far

Beyond test creation, the experiment introduced a structured validation mechanism through hooks.

Two hooks were configured:

PostGenerationLint
ValidateDTOSync

These hooks executed automatically during code generation.

PostGenerationLint checked each file immediately after creation. It returned structured feedback with error details and required fixes. The agent processed this feedback before moving to the next step.

ValidateDTOSync enforced consistency across services. If a data contract changed in one service, the hook detected mismatches in others and blocked further progress until alignment was restored.

The distinction between hooks and instructions is worth stating clearly. Instructions tell the agent what to do. Hooks enforce what the agent system will allow. The agent does not decide whether to comply with a hook.

The hook fires, the system parses the response, and execution either continues or stops. This makes hooks the most reliable form of quality control in an agentic workflow, because they operate independently of the model’s judgment.

Where AI handles testing well

Beyond what appeared in this specific experiment, AI contributes usefully to testing in several areas:

Writing end-to-end tests from a plain-language description of expected behavior
Generating test data sets that cover boundary conditions, empty states, and malformed inputs
Producing test scaffolding and boilerplate that engineers then refine
Running regression checks against an existing test suite after changes
Documenting what each test covers in a format that makes review faster

Where a human still needs to be in the room

Eugene was direct about the limits of what he verified. The 42 tests ran and passed. He did not audit whether they tested the right things at the right depth. This is the central risk of AI-generated test suites: coverage percentage is a metric the agent can optimize for, but coverage percentage does not measure whether the tests reflect the actual requirements of the system.

Specific areas that require human judgment in AI-assisted testing:

Evaluating whether test cases reflect real business logic rather than surface behavior
Identifying gaps in coverage that a metric would not reveal
Reviewing tests for redundancy, specifically multiple tests asserting the same condition in slightly different ways
Assessing whether security-relevant tests cover the actual threat surface or only obvious cases
Making the final call on what constitutes an acceptable failure threshold before release

Telling an agent to achieve 80% test coverage produces 80% coverage. Whether that 80% covers the parts of the code that matter most is a question only someone who understands the product can answer.

Verdict

The testing stage showed that AI can generate and run tests with minimal input. The system produced a working test suite and executed it successfully.

At the same time, trust in the results requires verification. AI can create tests quickly, but it does not determine their relevance without guidance.

Yes, like in the previous stages, the most reliable outcome comes from combining automated generation with targeted human review.

AI changed QA work less through raw speed and more through redistribution of effort. Earlier, a large part of the time went into preparing artifacts from scratch: checklists, test cases, documentation, automation scaffolding. Now the initial draft often appears in minutes. The harder part is deciding whether the result is actually reliable.

In practice, AI works especially well for structured and repeatable tasks. It can quickly generate test scenarios, prepare automation drafts, summarize requirements, or help navigate a large codebase. This removes much of the routine preparation work and allows QA engineers to focus more on logic, coverage gaps, integration risks, and unstable areas of the system.

At the same time, AI changes the bottleneck inside testing. Earlier, the main cost was producing test artifacts. Now the bigger challenge is validation. AI can generate large volumes of technically correct but shallow output. Coverage numbers may look impressive while important business risks remain untested.

Another important shift is that AI performs differently depending on project maturity. In projects with stable architecture, clear rules, and consistent patterns, results are usually strong. In systems with weak documentation, hidden dependencies, or inconsistent logic, the quality drops quickly because the model fills gaps with assumptions.

For me, the key lesson is that QA becomes more important, not less. AI increases development speed and output volume, which also increases the importance of review, verification, and system-level thinking. The role shifts away from writing everything manually toward validating whether the generated result is trustworthy enough to move forward.

Aleksandr KiselevQA Engineer at Aristek

Deployment stage: The stage that almost ran itself

I didn’t write a single line of Docker configuration. It generated everything. One file didn’t work, I fixed it in a few minutes, and we moved on. That’s roughly where things stand right now.

Eugene IvanovSoftware Engineer at Aristek

Deployment was the shortest stage in the experiment, and in some ways the most straightforward.

At the final stage, deployment followed the same pattern as earlier steps. The agent used the existing instructions to generate infrastructure configuration without manual setup.

It created Dockerfiles for each service and two Docker Compose configurations: one for development and one for production. The setup reflected the structure defined during development. No separate infrastructure design was introduced at this point.

From instructions to a running environment

The deployment setup was derived directly from the project context.

The agent:

Generated Dockerfiles for all services
Created a production Docker Compose configuration
Created a development Docker Compose configuration
Connected services based on previously defined ports and dependencies

The only manual input was related to port selection. Eugene specified non-default ports to avoid conflicts on his local machine. The agent applied these values without additional adjustments.

Where configuration needs correction

One issue required direct intervention. The development Docker Compose setup did not run as generated.

Eugene reviewed the configuration and fixed it manually. The correction took a few minutes. The rest of the setup worked as expected.

This reflects a common pattern. AI-generated infrastructure often reaches a near-complete state. Final adjustments are still required for edge cases.

Working with legacy systems

Most modern agent systems include an initialization stage. The agent scans the repository, maps dependencies, reviews available documentation, and generates a working instruction file describing the codebase structure. This reduces the time needed to navigate large projects and helps engineers start working faster.

However, this should not be interpreted as automatic understanding of a production system. Generating a feature inside a mature application still requires architectural awareness, validation, and engineering control. Existing systems contain business rules, undocumented dependencies, historical decisions, infrastructure constraints, and edge cases that are often distributed across teams rather than stored in a single source of truth.

Documentation is only one part of the problem. System consistency also depends on code quality, naming conventions, service boundaries, test coverage, release processes, and how predictable the existing architecture is. AI performs best in environments where these elements are already structured and maintained.

A monolith with clear patterns and stable documentation is easier for an agent to analyze. A fragmented microservices environment with inconsistent standards and missing ownership introduces more uncertainty, because the agent can only infer relationships from the information available in the repository.

In practice, AI shortens the onboarding and discovery phase. It does not replace the engineering work required to safely extend or modify a production system.

Where AI is effective in deployment tasks

AI performs well when infrastructure follows standard patterns.

It can:

Generate container configurations for services
Define service relationships in Docker Compose
Reuse configuration patterns across environments
Align infrastructure with application structure

Where human input is still required

Deployment still depends on validation and environment awareness.

Human input is required for:

Resolving configuration errors
Adjusting environment-specific parameters
Verifying service communication and dependencies
Extending setup to production-grade infrastructure

CI/CD pipelines, monitoring, and alerting were not part of this experiment. These areas require additional configuration and validation.

Verdict

Deployment was the stage where AI required the least intervention and produced the most complete output relative to what was asked. The infrastructure configuration was generated from instructions, worked almost entirely on the first attempt, and needed one manual correction on a single file.

For new projects, this stage is where the time savings are clearest and the risks are most manageable.

What this experiment shows about AI-assisted development: results, time, and honest limits

Across five stages, the pattern was consistent. AI accelerates the execution of well-defined tasks significantly. But it does not replace the engineering judgment that makes those tasks well-defined in the first place.

In business analysis, AI compressed the distance from idea to structured brief from days to a single conversation.
In design, it produced a coherent component system in minutes once the tooling was configured correctly.
In development, it generated backend services, a frontend, shared contracts, and infrastructure in under two hours of agent runtime.
In testing, it wrote 42 Playwright tests and inferred that XSS injection belonged in scope without being asked.
In deployment, it produced a near-complete Docker configuration on the first pass.

At every stage, the quality of output depended on the quality of the instructions behind it. This is the central finding. AI does not drift because it lacks capability. It drifts because it lacks structure.

The engineer’s job shifted from writing code to defining the system that generates it: instruction architecture, validation rules, context management, and knowing when to restart rather than patch. These are engineering decisions, and they require experience.

The other consistent finding is that it gains compound. The first project in this workflow required the most setup. The instruction files, validation hooks, and design pipeline configurations built during this experiment are reusable. The next project starts from a working baseline. Setup cost is paid once.

And at what cost? Time.

The full process, from the first prompt in ChatGPT to a working application, took around 10 hours. Not bad for a weekend, but that needs a bit of context.

Those were not 10 focused hours at a desk. The engineer worked across a weekend, in the evenings, writing an instruction and leaving the agent to run while he went and did other things. He cooked dinner. He drove. He came back, checked the output, wrote the next instruction, and left again.

About half of the total effort went into setup. This included configuring the agent system, stabilizing the design pipeline, and restarting the project several times.

Once the setup was stable, individual steps were short:

Backend services generated in 6–9 minutes
Full test run completed in about 17 minutes
Design system produced in roughly 10 minutes per run
Infrastructure configuration generated within a single stage

What was built, and how difficult is this type of application?

The application built during the experiment was not a production-scale platform. It was a relatively compact service-based system created to test how far AI-assisted workflows could go across the full development lifecycle.

The final version included:

A Next.js frontend
Multiple backend services
Shared contracts between services
Automated testing
Containerized deployment
AI-generated design assets

For an experienced engineering team, this is a manageable scope. At the same time, it still represents enough moving parts to expose coordination problems, architectural inconsistencies, testing gaps, and workflow limitations.

That is what made the experiment useful.

Without AI assistance, a similar prototype would typically require more implementation time across design, backend development, frontend integration, testing, and infrastructure setup. In this case, AI reduced the amount of manual implementation work substantially, especially during scaffolding and repetitive development tasks.

However, the reduction came mostly from accelerating execution, not from removing engineering complexity.

The project still required:

Architectural decisions
Instruction management
Output validation
Workflow corrections
Review of generated behavior
Repeated refinement across stages

This distinction is important. AI reduced the time spent writing predictable code manually. It did not eliminate the need for experienced engineering oversight.

For teams already working with structured development processes, this is where the largest practical gains appear today.

How this compares to standard delivery

For context: building a backend with this number of services from scratch typically takes an experienced engineer around 20 hours for the data layer alone. Adding frontend, design, infrastructure, and testing brings total effort to 50–60 hours under normal delivery conditions.

The same scope was completed here in 10 hours, roughly half of which was environment setup. The gap closes on more complex systems, where domain knowledge and architectural judgment cannot be delegated to an agent. But for well-scoped greenfield work, the time difference is real.

The application ran. The services communicated. The tests passed. Eugene identified visible issues and fixed them; anything not surfaced during that review remained in the codebase. That is the trade-off stated plainly: AI-assisted development at this pace produces working software, not reviewed software. Closing that gap is a process question, not a capability question.

The result worked. The time savings are real. At the same time, there is one detail that should not be ignored – the generated code was not reviewed line by line.

I tested the application, checked outputs, and fixed visible issues. Anything not caught during this process remained in the codebase.

This does not invalidate the time savings. It does define the trade-off.

Eugene IvanovSoftware Engineer at Aristek

How to integrate AI into real development workflows

Every stage of Eugene’s experiment ended with the same observation: AI performs well when it has clear instructions, a feedback loop, and a human who knows what a correct result looks like.

Remove any of those three, and the output degrades in predictable ways. The design agent drew crooked layouts until it was told to validate dimensions. The development agent mixed package managers until the instructions enforced consistency. The test suite looked complete until someone noted that coverage numbers and coverage quality are different things.

This pattern points to something worth stating directly. AI does not fail because it lacks capability. It drifts because it lacks structure. And structure, in a real team environment, comes from a framework that defines how AI is used, how its outputs are checked, and how the whole thing fits into existing workflows.

At Aristek, integrating AI across the development lifecycle is structured around six operational layers. Each layer addresses a specific question about how AI fits into real team work, from deciding where to apply it, to tracking whether it is delivering value over time.

The framework operates across six layers, each addressing a specific question about how AI fits into real work.

1. Use layer — apply AI where it adds value

AI is most effective when applied to specific, well-defined tasks across roles.

Typical use cases include:

Business analysis: drafting requirements, structuring scenarios
Design: prototyping, exploring UI options
Development: generating code, refactoring existing logic
QA: creating test cases, identifying edge conditions
Operations: analyzing logs, identifying anomalies

This layer defines scope. It answers a simple question: where does AI save time without reducing clarity?

A common risk appears here. AI can generate more output than a team can realistically review.

2. Control layer — validate outputs early

Speed without validation creates inconsistency.

Control mechanisms introduce checks at the moment output is produced:

Developer review during generation
Defined code review rules
Automated test validation
Output constraints and guardrails

These controls reduce the chance of incorrect logic moving forward.

Without them, common issues include:

Inconsistent implementations
Incorrect assumptions in generated code
Tests that pass but do not validate real behavior

3. Integration layer — make AI part of daily work

AI creates impact only when it is part of existing workflows.

This includes:

Integration into IDEs and development tools
Use within CI/CD pipelines
Inclusion in pull request workflows
Connection to documentation and knowledge bases

This shifts AI from individual use to team-level practice.

Without integration, usage remains fragmented. Results vary between developers, and outputs do not align.

4. Context layer — provide the system with real data

AI depends on context. The quality of output reflects the quality of input.

Relevant context includes:

Access to the codebase
Existing design systems
Project documentation
Data models and domain rules

When this context is available, outputs align with the system.

Without it, results become generic. Rework increases because decisions do not match the actual project.

5. Observability layer — track what is happening

AI usage needs to be visible.

This includes:

Tracking how often AI is used
Measuring output quality
Monitoring performance impact
Tracking associated costs

Visibility helps teams understand where AI adds value and where it introduces inefficiencies.

Without it, adoption is difficult to manage. Costs and results remain unclear.

6. Evolution layer — improve the process over time

AI workflows do not stay static.

They require continuous adjustment:

Refining prompts and instructions
Updating workflows based on results
Optimizing cost and execution time
Adapting to new tools and models

Without this step, initial gains decrease over time.

What this experiment actually proves

The experiment produced a working application, an honest account of where AI helped and where it didn’t, and one conclusion that holds across every stage: AI in software development is not a future consideration, it is a present one.

The teams that treat it as such are already moving faster. The teams waiting for more certainty are falling further behind that gap with every sprint.

The point was never that AI replaces developers. The experiment demonstrates the opposite clearly. Every stage where the output was good, a developer had defined the structure, written the instructions, and reviewed the result. Every stage where it drifted, that oversight was missing. The tool is capable, but the judgment belongs to the engineer.

At Aristek, this experiment is not an isolated weekend project. It reflects how our engineers are approaching development today, on real projects, across different environments and constraints.
We have worked through the setup costs, the tooling gaps, the context management decisions, and the points where human review is non-negotiable. That experience is what we bring when we help teams integrate AI into their development process.

We work with engineering teams to define where AI fits into their specific workflow, how outputs are validated before they move forward, and how to maintain full visibility over what is being generated, reviewed, and shipped. The goal is a development process where AI handles execution and engineers remain responsible for every decision that matters.

If your team is already working with AI but results are inconsistent

… or if you are planning to introduce it into active development, book a free consultation to discuss this.

Coding is now all prompting? An honest experiment with building a working app from scratch using AI

We didn’t plan this. The industry did

Business analysis stage: where the first decision is already delegated

From a rough idea to a structured brief (without a single meeting)

Speed changes the scope of early analysis

Where AI needs direction

Verdict

Design stage: 3 hours to set up, 10 minutes to build

“It knew exactly what to do. It just couldn’t see the result.”

The tooling situation is honest about where it stands

What came out at the end

One decision worth making early

Verdict

Development stage: Can AI handle a full stack on its own?

Instructions define the system

Parallel work instead of sequence

Mistakes are part of the process, not a sign it is failing

Where AI is effective in development

Where control is still required

Verdict

QA & testing stage: 42 tests written. None of them by a human.

Hooks: the part of QA that runs before a mistake can travel far

Where AI handles testing well

Where a human still needs to be in the room

Verdict

Deployment stage: The stage that almost ran itself

From instructions to a running environment

Where configuration needs correction

Working with legacy systems

Where AI is effective in deployment tasks

Where human input is still required

Verdict

What this experiment shows about AI-assisted development: results, time, and honest limits

And at what cost? Time.

What was built, and how difficult is this type of application?

How this compares to standard delivery

How to integrate AI into real development workflows

1. Use layer — apply AI where it adds value

2. Control layer — validate outputs early

3. Integration layer — make AI part of daily work

4. Context layer — provide the system with real data

5. Observability layer — track what is happening

6. Evolution layer — improve the process over time

What this experiment actually proves

If your team is already working with AI but results are inconsistent

Latest articles

AI personalization in learning platforms: why adaptive systems stop adapting and how to fix it

LMS Integration: Definition, Examples and Benefits

How Much Does It Really Cost to Implement AI in Business?

LMS Migration: A Step-by-Step Project Plan for Switching Learning Platforms