Image for post
Image for post
Photo by Karolina Grabowska from Pexels

Today is Cyber Monday, yet marketers have already shifted into the highest gear to connect with desired audiences. The battle between businesses has begun, driving competition higher as they compete for a share of the shrinking consumer pie.

COVID restrictions and decreased spending power have made online sales more important than ever before. Massive consumer appetites for discounts are resulting in serious competition among retailers, driving prices down even before the holiday shopping even begins.

At Oxylabs, we’ve already noticed the rise in web scraping (also known as data extraction) across all e-commerce sectors. This is not a surprise as the trend towards ethical web scraping has been increasing for years now. Since more data means better insights, this is likely to continue — especially since awareness of the power of web scraping keeps increasing in the marketing community. …


Image for post
Image for post
Source: Oxylabs’ design team

There is an invisible war taking place in the e-commerce world. Made up of numerous battles fought by soldiers, it is waged by major players competing for dominance in the highly competitive e-commerce environment.

The purpose is clear: to post the lowest price and make the sale.

While people don’t realize that this war is taking place it’s still there and is getting more brutal as time goes on. My company — Oxylabs — provides the proxies or “soldiers”, plus the strategic tools that help businesses win the war. …


Are you approaching data gathering on a large scale in a traditional manner? If so, expect to invest a lot of time and effort into proxy infrastructure maintenance.

Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and much more. Is there a way to automate these processes? Absolutely.

Finding a more manageable solution for a large-scale data gathering has been on the minds of many in the web scraping community. Specialists saw a lot of potential in applying AI (Artificial Intelligence) and ML (Machine Learning) to web scraping. However, only recently, actions toward data gathering automation using AI applications have been taken. …


As data validation is still one of the biggest challenges for data-driven companies today, here’s an effective improvement solution for tech teams around the globe.

Data validation is forecasted to be one of the biggest challenges e-commerce websites are likely to experience in 2020. In this article, we will go over key statistics highlighting the main data validation issues that currently impact big data companies. The article’s final aim is to propose a quality improvement solution for tech teams.

Retail e-commerce companies continue to see a rapid growth of sales on a global scale. In 2019, the worldwide sales reached 3,535 billion U.S. dollars, and according to Statista, these numbers will nearly double in 2023 and reach as much as 6,542 billion U.S. dollars.

As e-commerce continues to grow, so does the data stored. Data volumes in e-commerce alone will grow exponentially. Data Age 2025 report for Seagate forecasts the global datasphere to reach 175 zettabytes by 2025.


Interested in building and implementing a large-scale web crawler? Learn from our mistakes, experience, and advice, which will help you to create a robust tool for web data acquisition.

Web coding
Web coding
Photo by Markus Spiske on Unsplash

At Oxylabs we happened to build a large-scale web crawler almost by accident. As with any development project that people just happen to stumble upon — we made a lot of mistakes and learned a lot of lessons along the way. We’re hoping our story could be of use to anyone interested in creating tools for data acquisition.

We started out as proxy providers offering services to all who wanted to perform large-scale data gathering projects. …

About

Julius Cerniauskas

Julius Cerniauskas is Lithuania’s technology industry leader & the CEO of Oxylabs, covering topics on web scraping, big data, machine learning & tech trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store