Q.17 Validating the machine learning model during the training and development stages is crucial for ensuring accurate predictions. Differentiate between the Train–Test Split and Cross-Validation methods, and support your answer with a neat, labelled diagram.
Ans:
| Train-Test Split |
Cross Validation |
|
|
| Normally applied on large data sets. |
Normally applied on small data sets. |
| Divides the data into training data set and testing data set. |
Divides a dataset into subsets (folds), trains the model on some folds, and evaluates its performance on the remaining data. |
| Clear demarcation on training data and testing data. |
Every data point at some stage could be in either testing or training data set. |
Q.18 A global e-commerce company handles millions of transactions every day. Identify the characteristics of Big Data you can relate to in the following scenarios and explain them in detail:
a) Customers place orders online, and the website records thousands of clicks, searches, and transactions every second.
Ans: Velocity refers to the speed at which data is generated, processed, and analyzed.
In today’s digital world, massive amounts of data are produced every second.
Example: Thousands of clicks, searches, and transactions happening every second show high-speed data generation and often require real-time processing.
b) The company’s servers store petabytes of customer orders, payment details, and product listings collected over years.
Ans: Volume refers to the huge amount of data generated and stored.
This data can range from terabytes to petabytes or even exabytes.
Example: Storing petabytes of customer orders, payment details, and product listings over time represents large data volume.
c) The stored data comes in different formats — structured (databases of products), semi-structured (XML, JSON order files), and unstructured (customer reviews, product photos, videos).
Ans: Variety refers to different types and formats of data.
Data can be structured (tables), semi-structured (XML/JSON), or unstructured (text, images, videos).
Example: Product databases, XML/JSON files, and customer reviews/photos/videos show data diversity.
d) Sometimes, the system collects incomplete or duplicate records, and before analysis, the team removes errors to ensure the information is accurate and trustworthy.
Ans: Veracity refers to the quality, accuracy, and reliability of data.
Data may contain errors, duplicates, or inconsistencies.
Example: Cleaning incomplete or duplicate records ensures trustworthy and accurate data.
Q.19 Explain the role of neural networks in the future of AI.
Ans: How Neural Networks are Helping AI Grow

• Advancing Deep Learning: Neural Networks with multiple layers power Deep Learning, helping
machines understand complex patterns in large datasets.
• Improving Accuracy and Efficiency: They analyze vast data and make highly accurate predictions (e.g., detecting diseases from medical images), boosting efficiency in real-world tasks.
• Supporting Autonomous Systems: Neural networks provide the perception and decision-making capabilities needed for autonomous systems (e.g., self-driving cars, robots, drones), enabling real-time sensing, control, and safe automation.
• Personalizing Experiences: By analyzing user preferences, Neural Networks power recommendation systems (shopping, music, movies), creating tailored experiences.
Q.20 Explain how Meta’s LLaMA is unique compared to traditional LLMs with a neat diagram.
Ans: LLaMA is unique compared to traditional Large Language Models (LLMs) in the following ways:
1. Efficient and Open Training Approach:
LLaMA is trained primarily on publicly available data (text and code), unlike many traditional LLMs that rely heavily on proprietary datasets. This promotes transparency and research accessibility.
It also uses efficient training techniques, allowing strong performance with less computational power, making it more scalable.
2. Multiple Model Sizes (Flexibility):
Meta released LLaMA in different sizes (e.g., 7B, 13B, 33B, 65B parameters).
- Smaller models → suitable for devices with limited resources
- Larger models → better for complex NLP tasks
This makes LLaMA highly flexible and practical.
3. High Performance with Less Data/Compute:
Despite using public datasets and fewer resources, LLaMA achieves performance comparable to or better than larger proprietary models in tasks like:
- Text summarization
- Question answering
- Language understanding
This shows that better training strategy can outperform sheer model size.

Q.21 The image illustrates Freytag’s Pyramid, a classic narrative structure.
While (1) Exposition introduces the story and characters, identify and briefly explain the stages labeled as 2, 3, 4, and 5 in the diagram.