Military Antiques Trends and Predictions
2024
NOTE: I HAVE UPGRADED AND ROLLED THIS PROJECT INTO A NEW PROGRAM ON THE CLOUD!
Project Overview
I had been collecting my own data on items I bought, sold and traded for many years. Almost all of it had been done with pen and paper. Referencing this data was unreliable, skewed and cumbersome. By collecting the product data of different websites I can create a more objective data base and get useful data out of it. By collecting data from a handful of sites I am able to make general predictions and infer certain trends.
Step One : Identifying Data
Finding Good Sources of Data
As they say, “Garbage in, garbage out”. Initially finding sources looked to be a straight forward process. The thought was to just find a site and scrape the data into my data base. Reviewing many sites I realized that they cut corners by not putting things into categories or using proper elements. Once a suitable site had been found, the data was mined.
Good data :
https://virtualgrenadier.com/sale_item.php?iid=7665
Clearly labeled which country, conflict and item type.
Step Two : Collect and Store Data
Collecting the Data
Using python I created a program which scraped all the products from selected sites. I scraped all useful information including but not limited to:
Title / Description
Price
Dates (Scraped Date / Sold Date)
Product Categories (If applicable)
I ended up with over 70,000 products in my database to work with.
(Left) There were a lot of products I couldn’t scrape an item type easily from. This will be addressed with the logistical regression model.
You can see in the table above the most common conflict, nation and item type. Null is by far the most common but after that is U.S. and German Insignia from the Second World War.
Step Three : Transforming the Data
Categorizing the Data
Title / Description
The title and description of items were one of the consistent things about mining this data. Relatively straight forward.
Price
Sold items were tricky so I looked around for sites which had the prices even though they were sold. If they did not have a price, I scraped the info anyways for my logistic regression model.
Dates
Most sites lack sold dates, so I researched forums for dates when items were purchased. I also used my own purchase records and extrapolated sold dates based on collected data. Similarly, determining the date items were listed for sale presented a similar challenge, for which I applied a similar solution.
Categories
Getting the categories was pretty important for what I planned to do with my predictive model. Ideally the categories would be as specific as possible. Not just a helmet but a “French, World War One, Artillery Emblem Adrian Helmet”. Breaking this down into categories within categories.
Bad data :
https://frontovik.com/product/soviet-ssh-36-helmet/
No clear categories in any respect.
On the right there is a link to my simple program I made to help me figure out the right html elements to input into BeautifulSoup more efficiently.
Step Four : Data Visualization and Analysis
Extracting Useful Data
The two graphs on the right were the average price of two specific item types. The trend is a bit hard to see with such a small scope but looking at the average price of all items over the years shows a bit clearer trend.
Step Five: Data Categorization Through Logistic Regression
Creating a Machine Learning Logistic Regression Model
As soon as I had created my model and inputted data I had cleaned through pandas, I wanted to see the accuracy of my model. On the right shows three different categories (conflict, nation, item type) I was trying to predict with the use of logistic regression. Most proved to be pretty accurate with some not having enough data to produce reliable results.
Input:
https://www.therupturedduck.com/collections/recently-added-items/products/wwii-us-m-1-helmet-with-early-hawley-liner
Output:
Price of Civil Defense ‘Gladiator’ Helmets
Once I got enough of data together I created a logistic regression model in python. Once I made sure the model was accurate (97% accuracy). I tested it with products I would find online. The model did not disappoint and nearly all of the samples I had input had gotten the correct conflict, nation and type of item as an output (I realize my data is heavily skewed for a few parameters and am currently working on rounding this out).
Price of Iron Crosses 2nd Class
Yearly Trends
On the left, the graph illustrates the average price of items sold in a particular year (created using Tableau). The noticeable dip in average prices in 2019 is likely attributed to Covid-19. This data primarily originates from Stewart’s Militaria, focusing on the German section. I am continuing to extrapolate more dates and gather additional data to obtain a more accurate overview.
Bonus Project:
In order to make the process of getting elements and selectors easier, I decided to make a small program with tkinter.
I made this because it was a pain to rerun my entire program every time I wanted to see if I had gotten the correct elements or selector. Not anymore!
With this program you just:
Input the URL, tag, attribute(optional) and value(optional).
Hit submit.
Output of the element(s) you chose.
Conclusion
With the data I have collected and the machine learning model I have developed, I am able to create a more concise data set. Items can be categorized into their most basic components and processed through various data analysis tools. The more data I collect, the more accurate my categorization machine learning tool will become.
Right now I am working on…
Improving my Data
Getting more dates, more accurate category labels and more source sites that can give me better data.
Streamlining my Program
I want to condense my data to be a bit more efficient.
Moving Data to the Cloud
Ideally my data would be accessible from anywhere in the world. Host a server or putting everything on the Cloud.
Presenting my Data
Getting my data and presenting it in a way that is as useful as possible through Tableau.