Hardware Acceleration for Breast Cancer Detection Using FPGA

1. Introduction

What is Hardware Acceleration?

Hardware acceleration is the unsung hero of modern AI. It’s what gives machines the ability to crunch numbers faster, work smarter, and ultimately deliver results in record time. Instead of relying solely on general-purpose CPUs, hardware acceleration taps into specialized hardware like FPGAs (Field-Programmable Gate Arrays) or GPUs to boost performance. The result? A massive leap in speed, efficiency, and capability—perfect for complex, data-heavy tasks like medical imaging.

Why is it Essential for Healthcare?

In healthcare, speed and precision are non-negotiable. Breast cancer classification, for instance, requires real-time diagnostics to give patients the best chance at effective treatment. Traditional AI models struggle to meet these demands; they consume too much power, take too long, and often depend on cloud connectivity. Hardware acceleration changes the game entirely, as highlighted in the study by Mhaouch et al. (2025). By using FPGA-based systems, AI applications in healthcare achieve lightning-fast processing and reduced power consumption, making advanced diagnostics accessible even in remote areas.

Diagram of Supervised Learning for Breast Cancer classification.

The FPGA-Based System: Pioneering Breast Cancer Detection

The study introduces a cutting-edge approach that merges software and hardware for breast cancer detection. Leveraging the PYNQ-Z2 platform, this hybrid system combines FPGA resources with the ARM Cortex-A9 processor to optimize diagnostic workflows. Its standout feature? Modular IP cores designed for CNN tasks like Conv2D, Average Pooling, and ReLU activation. This allows the system to achieve breakthroughs in speed, energy efficiency, and accuracy, redefining what’s possible in real-time medical diagnostics.

2. Hardware Acceleration: Understanding Hardware Acceleration for AI Models

Simplifying the Concept

Hardware acceleration isn’t just a fancy term—it’s the engine that powers AI at its best. Imagine trying to solve a thousand-piece puzzle. A CPU does it one piece at a time, methodically yet slowly. An FPGA, on the other hand, handles multiple pieces at once, cutting the time and effort dramatically. This parallel processing capability is why hardware accelerators outperform CPUs in tasks requiring heavy computation, like breast cancer classification.

Workflow of the proposed AI model for breast cancer classification.

Hardware Acceleration: Key Benefits of Hardware Acceleration

1. Speed Through Parallel Processing Hardware acceleration shines when it comes to speed. Unlike CPUs that handle one task at a time, accelerators like FPGAs perform multiple operations simultaneously. This is what allowed the FPGA-based system in the study by Mhaouch et al. (2025) to reduce execution time by 16.3%, enabling fast and seamless breast cancer detection.

2. Energy Efficiency Power efficiency isn’t just a perk—it’s a necessity, especially for edge devices used in healthcare. The study demonstrated that FPGA-based systems consume 63.15% less power than traditional CPU setups, operating at just 1.4W compared to 3.8W. This energy-saving feature makes it ideal for rural clinics and mobile healthcare units where resources are limited.

3. Reconfigurability and Flexibility Unlike GPUs, FPGAs can be tailored for specific tasks, making them versatile tools for deploying various AI models. In the study, modular IP cores were developed to accelerate key CNN operations, ensuring that the system performs optimally while remaining scalable for future advancements (Ghani et al., 2024).

4. Scalability for Embedded and Edge Computing Hardware acceleration allows AI models to function independently of cloud services, enabling localized processing. This scalability ensures that advanced diagnostics are accessible even in remote regions. The FPGA-powered system described by Mhaouch et al. (2025) is a perfect example of this, delivering real-time results in decentralized settings.

Detailed Architecture of the Proposed CNN Model for Breast Cancer Classification.

Why FPGA-Based Systems Stand Out

FPGAs take hardware acceleration a step further with their ability to be reprogrammed for different applications. This makes them ideal for AI tasks, where flexibility and efficiency are key. In the study, modular FPGA cores such as Conv2D, Average Pooling, and ReLU activation were implemented using loop unrolling and pipelining techniques to optimize speed and throughput.

Overall hardware architecture of CNN model.

Table: FPGA Layer Performance

Layer	Slices	Latency (Cycles)	Frequency (MHz)
Conv2D	15,456	34,848,820	120.17
Average Pooling	438	1,041	114.28
ReLU	3,376	269	102.56

These innovations demonstrate the power and adaptability of FPGA-based systems, positioning them as game-changers in healthcare diagnostics.

3. Hardware Acceleration: FPGA-Based System for Breast Cancer Classification

Hardware Acceleration: Breaking Down the Design

The FPGA-based system, as described by Mhaouch et al. (2025), is a revolutionary approach to improving the accuracy and speed of breast cancer detection. It combines the PYNQ-Z2 platform with the computational power of an ARM Cortex-A9 processor, creating a hybrid solution that optimizes the performance of convolutional neural networks (CNNs) for medical imaging tasks.

The unique strength of this system lies in its custom modular IP cores, which include:

Conv2D Layer: This handles convolution operations crucial for feature extraction in medical imaging.
Average Pooling Layer: Reduces data dimensions while retaining essential features, streamlining processing.
ReLU Layer: Introduces non-linearity to enhance the model’s learning ability and pattern recognition.

These IP cores leverage the FPGA’s flexibility to handle intensive computations in parallel, drastically boosting performance. The PYNQ-Z2 platform, with its combination of hardware and software components, ensures that CNN-based breast cancer classification models run efficiently, even in real-time settings.

Key Optimizations for Real-Time Inference

To ensure the system met the demands of real-time diagnostics, the researchers implemented two critical optimizations:

8-Bit Fixed-Point Arithmetic By switching from traditional floating-point to 8-bit fixed-point arithmetic, the system achieved significant reductions in computational complexity. This led to faster processing speeds while maintaining an impressive accuracy of 89.87%. Although there is a slight trade-off in precision, the gains in performance and power efficiency make it a worthwhile compromise for real-world applications.
Loop Unrolling and Pipelining Advanced techniques like loop unrolling and pipelining were employed to improve data throughput. Loop unrolling allows multiple operations to run simultaneously by breaking tasks into smaller, parallelized chunks. Meanwhile, pipelining ensures that data flows continuously through the system without bottlenecks. Together, these methods drastically reduced latency and enhanced execution speed.

Performance Metrics: Layer Analysis

Each layer of the CNN was optimized to utilize FPGA resources efficiently, as shown in the table below:

Layer	Slices	LUTs	Flip-Flops (FFs)	DSPs	Latency (Cycles)	Frequency (MHz)
Conv2D	15,456	6,665	2,234	20	34,848,820	120.17
Average Pooling	438	1,033	1,530	0	1,041	114.28
ReLU	3,376	5,561	9,637	0	269	102.56

The Conv2D layer consumes the most resources, as expected, due to its computational intensity. The ReLU layer, while relatively light in terms of resource usage, plays a critical role in improving the system’s learning capacity.

4. Real-World Benefits of Hardware Acceleration

Faster Diagnostics That Save Lives

Speed is crucial in medical settings. In the study, hardware acceleration reduced the execution time from 0.981 seconds on an ARM Cortex-A9 to 0.821 seconds on FPGA. This 16.3% improvement enables clinicians to analyze medical images faster, enhancing patient outcomes. For breast cancer detection, where early diagnosis is vital, every millisecond counts.

Energy Efficiency: Tailored for Remote Clinics

One of the standout achievements of this FPGA-based system is its 63.15% reduction in power consumption. While the CPU-based system consumed 3.8 watts, the FPGA setup required only 1.4 watts to operate effectively. This makes the system ideal for mobile healthcare units and clinics in underserved areas, where energy resources are limited. By consuming less power, the system bridges the gap in healthcare accessibility.

Balancing Accuracy and Speed

Despite the use of 8-bit fixed-point arithmetic, which slightly reduces precision, the system maintains a solid accuracy of 89.87%. The minimal drop in accuracy is offset by the significant gains in speed and energy efficiency, proving that hardware acceleration doesn’t compromise reliability.

Comparison of CPU and FPGA Performance

To highlight the benefits further, here’s how the FPGA implementation compares to its CPU counterpart:

Metric	ARM Cortex-A9	Zynq FPGA	Improvement
Execution Time (s)	0.981	0.821	16.3% faster
Power Consumption (W)	3.8	1.4	63.15% lower

These results demonstrate that FPGA-based hardware acceleration not only improves speed but also makes the system far more energy-efficient.

This well-optimized design highlights the transformative potential of hardware acceleration in medical diagnostics, especially for breast cancer classification. By combining speed, accuracy, and energy efficiency, the study by Mhaouch et al. (2025) sets the stage for scalable healthcare innovations.

5. Comparison with Other Breast Cancer Detection Models

How Does It Stack Up?

When it comes to breast cancer detection, there’s no shortage of methods. Traditional approaches like SVM (Support Vector Machines) or more modern techniques like LWDCNN (Lightweight Deep Convolutional Neural Networks) have proven effective. But the FPGA-based CNN system introduced by Mhaouch et al. (2025) shakes things up by combining speed, efficiency, and portability in a way that’s hard to beat. Here’s how it compares:

Method	Accuracy (%)	Execution Time (s)	Power Consumption (W)
SVM	93.39	–	–
LWDCNN	91.89	6.44	–
CNN (Proposed FPGA)	89.87	0.821	1.4

Proposed hardware acceleration for the AI model in breast cancer detectionIn the proposed hardware acceleration design, three core IPs: 2D Convolution (Conv2D), ReLU activation, and Average Pooling, were implemented on FPGA. These IPs were specifically designed to optimize performance and significantly enhance execution speed, making the system more efficient for AI workloads. By leveraging the parallel processing capabilities and reconfigurable nature of the FPGA, this design achieves a substantial improvement in computational efficiency compared to traditional software-based implementations.

Why the FPGA CNN Steals the Show

Blazing Fast Speeds Let’s talk time—this FPGA system completes a task in just 0.821 seconds, making it nearly 8 times faster than LWDCNN’s 6.44 seconds. Speed like this is crucial, especially in medical diagnostics where waiting even a few minutes can feel like an eternity.
Energy Efficiency That Matters Efficiency isn’t just about speed; it’s also about power usage. The FPGA system operates at a modest 1.4 watts, which is far less than most CPU-based alternatives. Imagine a system this powerful that also runs on the kind of energy you’d expect from a small lightbulb. This is particularly useful in mobile clinics or areas with limited power resources.
Accuracy that Competes Sure, 89.87% accuracy is a little lower than SVM’s 93.39%, but here’s the thing—it’s a small trade-off when you consider how much faster and more efficient the FPGA system is. This balance makes it a standout choice for real-world applications where speed and reliability are both key.

The takeaway? The FPGA-based CNN isn’t just good on paper—it delivers where it counts, outperforming other methods in areas like speed and portability while holding its own on accuracy.

6. Applications and Implications

Changing the Game in Healthcare Diagnostics

Faster Results, Better Outcomes Time matters in healthcare, especially with life-threatening diseases like breast cancer. By cutting execution time to 0.821 seconds, this system makes early detection faster and more accessible. Early detection saves lives, and with FPGA-based hardware acceleration, doctors have the tools to act quickly and effectively.
Bringing Cutting-Edge Tech to Rural Clinics Unlike cloud-reliant systems, the FPGA setup can work independently—no need for constant internet. It also uses just 1.4 watts, making it perfect for mobile health units and remote clinics. This means high-quality diagnostics can reach places that were previously left out, closing critical gaps in healthcare access.

A Blueprint for Broader Impact

The beauty of this system is its flexibility. It’s not locked into breast cancer detection—it can be adapted for other medical applications with ease:

Lung Cancer Detection: Analyzing CT scans for early-stage abnormalities.
Cardiac Imaging: Identifying heart conditions using echocardiograms.

And it doesn’t stop at healthcare. From autonomous vehicles to real-time video surveillance, FPGA-based acceleration is paving the way for faster and more efficient AI in countless industries. The possibilities are endless, but the foundation is already solid, as shown in this study by Mhaouch et al. (2025).

This isn’t just technology for today—it’s laying the groundwork for a smarter, faster future. By making advanced AI more accessible and scalable, FPGA-based systems are breaking barriers and ensuring everyone, no matter where they live, gets access to better care. This is innovation at its most impactful.

7. Challenges and Future Directions

Current Limitations: Areas to Tackle

Even the most promising technologies have their hurdles, and FPGA-based hardware acceleration is no exception. The study by Mhaouch et al. (2025) outlines a couple of key challenges that need addressing:

Accuracy Trade-Offs
The use of 8-bit fixed-point arithmetic is a fantastic optimization for speed and energy efficiency, but it does come with a slight trade-off in accuracy. While the system achieves a commendable 89.87% accuracy, it lags slightly behind traditional floating-point methods that allow for higher precision. This limitation, though acceptable in practical scenarios, needs to be addressed for applications requiring utmost precision.
Development Costs
Building FPGA-based systems isn’t cheap. The initial development costs for designing modular IP cores and setting up the hardware-software ecosystem can be steep. This cost barrier could deter adoption, especially in resource-strapped healthcare facilities or smaller organizations looking to deploy AI solutions.

Future Research Focus: Where to Go From Here

The study doesn’t just stop at identifying issues—it paves the way for future advancements:

Higher Precision Arithmetic
A natural next step is exploring 16-bit fixed-point arithmetic. This approach could strike a better balance between computational efficiency and accuracy, giving FPGA systems an edge in applications where precision is non-negotiable.
Expanding Hardware Acceleration
Imagine a world where hardware acceleration isn’t limited to the initial layers of CNNs. Future work could aim to extend FPGA optimization to more complex architectures and layers, allowing the system to handle even heavier computations with ease. This development would open doors for broader applications in AI, from analyzing 3D medical scans to powering sophisticated models for other diseases.

By focusing on these areas, researchers and developers can refine FPGA-based systems into an even more impactful solution, bringing us closer to the goal of scalable, real-time AI in healthcare.

8. Conclusion

Recapping the Impact of Hardware Acceleration

Let’s rewind to why this matters. Hardware acceleration, especially through FPGA-based systems, has changed the game for medical imaging. It’s fast, efficient, and capable of handling the computational intensity of tasks like breast cancer detection. By cutting execution time to 0.821 seconds and reducing power consumption to a mere 1.4 watts, it sets a new benchmark for innovation in healthcare.

Sustainability and Scalability for the Future

This isn’t just a win for breast cancer detection—it’s a win for healthcare as a whole. FPGA-based AI models are scalable, capable of being adapted to other tasks like lung cancer diagnostics or cardiac imaging. Their energy efficiency and independence from cloud connectivity make them a sustainable choice for rural and mobile healthcare units.

The Bigger Picture: Revolutionizing Healthcare AI

The study by Mhaouch et al. (2025) isn’t just about showcasing a new technology—it’s about demonstrating what’s possible. These systems hold the power to make advanced diagnostics accessible to everyone, regardless of location or resources. They’re paving the way for equitable healthcare powered by cutting-edge AI.

The message is clear: hardware acceleration needs to be embraced on a wider scale. It’s time to invest in these technologies, refine them further, and ensure they reach their full potential. Together, we can use AI to change lives—and maybe even save them.

Click here to see more.

References

Mhaouch, A.; Gtifa, W.; Machhout, M. FPGA Hardware Acceleration of AI Models for Real-Time Breast Cancer Classification. AI 2025, 6, 76. Available online: https://doi.org/10.3390/ai6040076

License

This paper is distributed under the terms of the Creative Commons Attribution (CC BY) License. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited. License details available here: https://creativecommons.org/licenses/by/4.0/.