Data Engineers build pipelines and tools to derive insights from the data Crunchbase has. They also add more data from external sources to help grow our platform.
Eddie has worked at Crunchbase for about two and a half years and has recently been promoted to act as a technical lead on the Data Integrations team.
You’ve been with CB for almost 2.5 years now, what have you learned in that time?
At Crunchbase, managers really care about you and how you grow in your career. My manager and I establish goals for me each quarter, which gives me tasks to work on and grow in my career and skills. I’m assigned different projects based on my interests and that allow me to grow.
I also learned skills outside of coding and engineering. I’ve had chances to mentor other engineers with the support from my manager. I can see the people I’ve mentored grow and become some of the best engineers on the team. It’s a great accomplishment and one new to me.
What are the biggest advantages and benefits of working for Crunchbase?
I believe the biggest advantage is that Crunchbase cares about your family. As a new father, Crunchbase has provided me the support needed to be a good father. I’ve received good advice from colleagues who also have kids. They also gave me flowers when my baby was born. These little things help a newbie father face different situations.
Also, the work-life balance at Crunchbase is really good. In my team, we prioritize the quality of the product and believe people perform their best when they are not overworked. We make an effort to manage ongoing projects and find a good work-life balance for the whole team.
What exciting projects are you working on?
All of our projects are exciting. We have online and offline production projects. I’m most excited about the online production projects bringing in partner data. As a team, we bring data from different sources to Crunchbase to enrich our data. We not only care about the quantity of data, but the quality as well. The hardest part is trying to match our entities with partners’ data entities. It’s not always apples to apples.
The most exciting offline production project involves rebuilding our analytics pipeline to trigger Crunchbase analysis and user experience. We’re able to build the pipeline for our team of data analysts that encompasses different needs based on various customer segments.
What challenges has your team been working to solve?
One challenge we face is ensuring quality data. We build different tools to allow people and machine learning algorithms to check the quality of our data.
We also have to keep stakeholders happy. We build pipelines for them that our internal analytics team uses to analyze data. We also build separate pipelines for our data scientists. Periodically, we’ll get feedback from stakeholders to ensure that they’re happy.
Crunchbase is currently hiring Data Engineers, can you tell me more about what technologies they’d be using and for what?
We use several industry standard technologies for different purposes. We build pipelines in both streaming and batch fashions. Our main code base is in Python and we apply Kafka/Airflow to manage pipelines separately. We recently brought in Snowflake as a data warehouse to enrich our analytics and data scientists’ needs. Our team welcomes new technologies and tools that will make our data engineering ecosystem better.