Learning Goal: I’m working on a databases exercise and need an explanation and a

Place your order now for a similar assignment and have exceptional work written by our team of experts, At affordable rates

For This or a Similar Paper Click To Order Now

Learning Goal: I’m working on a databases exercise and need an explanation and answer to help me learn.Hey there, all the requirements in here.Answer these questions using Spark code. Submit your code (in a py file) and the answers to the questions (in a text file). The answers should use the full dataset, not the small dataset. Start with the code shown below. (Hint: for any tasks that say max/largest, don’t use sortByKey, because that’s much slower than a better option.)Which day had the largest number of installed drives, and what was this number?
How many distinct drives (by model+serial) are installed (i.e., that exist in the data) in each year?
What’s the max drive capacity per year?
Full dataset: change the file path to: file:///ssd/data/backblaze.csv (146 million rows) – my solution took 17minRun spark like this: spark-submit backblaze-spark.py –master=local[5]Or to hide log messages: spark-submit backblaze-spark.py –master=local[5] 2> /dev/nullStarting code with some examples that you can remove:from pyspark.sql import SparkSessionspark = SparkSession.builder.appName(“Backblaze”).getOrCreate()schema = “day DATE, serial STRING, model STRING, capacity LONG, failure INTEGER”d = spark.read.schema(schema).load(“file:///home/jeckroth/cinf201/spark/assignment/small-backblaze.csv”, format=”csv”, sep=”,”, header=”true”)d = d.rdd# print first 10 rowsprint(d.take(10))## How many failures occurred each year?# make key (year) & value (failure 0/1)d2 = d.map(lambda row: (row.day.year, row.failure))# add up failures per yearfailureCounts = d2.reduceByKey(lambda cnt, rowcnt: cnt + rowcnt)print(failureCounts.collect())## Which model (not serial number) has the most failures overall?# grab model & failure from data, model is the keyd3 = d.map(lambda row: (row.model, row.failure))# count failures for that model; result so far: [(modelX, 55), (modelY, 2100)]d3 = d3.reduceByKey(lambda cnt, rowcnt: cnt + rowcnt)# flip keys and values; result so far: [(55, modelX), (2100, modelY)]d3 = d3.map(lambda pair: (pair[1], pair[0]))# sort by value (second in the pair)d3 = d3.sortByKey(ascending=False) ### NOT EFFICIENT TECHNIQUEprint(d3.collect())


For This or a Similar Paper Click To Order Now

Leave a Comment

Your email address will not be published. Required fields are marked *

Featured Articles

We understand your difficulties and know how to help with your writing assignments. Client orientation is the main reason to opt for our writing company.

Professional writers

Our team consists of true experts and enthusiasts. They have academic degrees, practical experience, and a passion for writing.


Solid experience

For 11+ years of successful work, we have established effective solutions for students. Today, Eliteacademicessays.com is one of the best writing companies due to the combination of maturity and innovations.


Good prices

Our assistance implies a cheap essay writing service. It is not the lowest one, still, it corresponds to the decent quality.


Academic writing level

We comprehend the requirements of educational and academic spheres. Our essay writing service meets their demands, including citation styles, fresh ideas, well-grounded conclusions, etc.

An individual approach

Our essay writer creates every text from scratch in accordance with the client’s preferences. We discuss all the terms and offer an optimal solution.


Fast feedback

Our essay writing company has a modern Live Chat. It is always online for your convenience.


Scroll to Top