Militaria Machine Learning Results

Introduction to the Challenges of Classification

When working with machine learning to classify militaria into three categories—conflict, nation, and item type—there are unique challenges for each. Each category comes with its own set of ambiguities, patterns, and inconsistencies in the data, which can make classification difficult. After reviewing the results, I’ve identified key areas where accuracy is falling short and why this might be happening. Below is a breakdown of each category, what I discovered, and ideas to address the issues moving forward.

Conflict Classification: This category was the most challenging for the model to handle. A lack of clear labeling in product descriptions and overlapping historical contexts made it hard for the model to determine which conflict an item belonged to.
Nation Classification: While this category performed better, some confusion arose due to shared characteristics between nations, particularly for countries that worked together or shared equipment during historical conflicts.
Item Type Classification: This category was somewhere in the middle. Inconsistent organization by the source websites and broader catch-all categories, like "other," led to classification difficulties for some item types. However, more distinct categories, like helmets or edged weapons, were easier for the model to classify accurately.

By analyzing these challenges, I’ve started brainstorming ways to improve the classification process, including possible adjustments to how the models are trained and whether a unified model might perform better than separate ones.

(Above) WW2 German Instruction on how to label your items.

Nation Classification

Classifying product by nation is more straight forward than other classification types due to the fact that most products would have the name of the country of origin within their product title. Sometimes this can be misleading. For example, Ireland and the United Kingdom share similarities in many WWII contexts, and China and the United States often overlap due to the U.S. supplying substantial military aid and expertise to China during WWII.

The accuracy of each category may not be as high as desired but looking at the graph on the bottom right shows that for the 94.4% of products, the model will have >90% accuracy rates.

Conflict Classification

Why Conflict Classification Accuracy Is Low

Out of the three categories, conflict classification has been the most inaccurate. After looking into the results, I’ve come up with some ideas as to why. A lot of product listings don’t actually clarify what conflict an item is from. Sellers tend to assume the buyer already knows based on the item’s image, model, or general context. For example, a "Yugoslavian K98 bayonet" might make the AI or ML model think it’s tied to Yugoslavia in 1898. But the K98 was a German rifle used by the German Army in WW2, so it’s easy for the model to get confused.

This makes classifying conflicts tricky because it often relies on knowledge that isn’t explicitly stated in the product description. A lot of conflicts overlap in terms of equipment or design, which doesn’t help either. The model ends up trying to guess based on limited information.

What Could Help

One thing that might work better is using a single model to classify all three categories—conflict, nation, and item type—instead of separate models. A unified model could:

Use the same context across all three categories to make better predictions.
Avoid conflicts between categories (like classifying a helmet as "unknown" for conflict but "Germany" for nation).
Pick up on patterns and context in the product title and description that separate models might miss.

If the model is trained to predict all three categories at once, it might have an easier time figuring out the relationships between conflicts, nations, and item types. This could help improve the accuracy for conflicts, which has been the hardest to classify so far.

Item Type Classification

The item type classification can also be tricky for machine learning to figure out. After reviewing the data, I noticed a few things. First, some of the sites I pulled the data from group similar things together. For example, insignia is often lumped into the same category as uniforms. Second, a lot of sites didn’t bother using more specific categories like "art" or "tinnie" and just threw those items into a broader "other" category.

The takeaway here is that many of these sites don’t break things down into specifics—they tend to simplify by putting items into larger, catch-all categories. That said, some categories like helmets, edged weapons, and belts/buckles were much easier to classify because they’re more distinct and straightforward.

The inconsistency in how different sites organize their categories is probably a big reason why this classification isn’t as accurate as I’d like.

I may end up created a much broader list of categories moving forward to remove ambiguity.