Projects‎ > ‎

Project01

USF Grep

Due: Tuesday, September 12th at 11:59pm in your Github repo for Project01.

For this project you are going to implement a version of the grep UNIX command called usfgrep. Your implementation must be written in the C programming language and you need to use system calls for file I/O (open(), read(), and close()). You cannot use the buffered I/O library functions such as fopen(), fread(), etc. Here is the command line format for usfgrep:

usfgrep <string> <file1> [<file2> ...]

A user provides a string and then searches through one or more files to find if the string is contained in one or more lines in each file. If the user invokes usfgrep without at least 2 arguments you can print the following:

$ usfgrep
Insufficient arguments
Usage: usfgrep <string> <file1> [<file2> ...]

Your output should look like the following:

$ usfgrep main args.c
20: int main(int argc, char **argv) {
$


$ usfgrep main *.c
args.c:20: int main(int argc, char **argv) {
count.c:15: int main(int argc, char **argv) {
$

That is, for each file you find the lines that contain the given <string>. For each line where a match is found you need to output the file name, the line number containing the <string>, and the line itself. See output format above.

Note the difference in output when grepping one file versus more than one file. Also note that line numbers start at 1.

Here are some additional requirements:
  • You must use UNIX system calls for file I/O (open(), read(), close())
  • You must be able to push/pull your solution using git and github directly on your Raspberry Pi
  • You must demonstrate usfgrep on your Raspberry Pi:
    • kermit/ssh into your RPi
    • clone your github repo for Project01
    • compile your program
    • demonstrate that it works and handles insufficient arguments
    • I will provide some test files
  • The string must occur entirely on a single line, that is "ma\nin\n" is two lines and would not match the string "main".
  • You can assume that input file have a maximum line length of 511 bytes. If you find a line greater than 511, you can print and error message and exit.
Extra Credit (1 point each)
  • Turn in your solution 24 hours early (Monday September 11th before 11:59pm) and demonstration on Tuesday, September 12th to me or a TA.
  • Your solution reads BUFSIZE characters each time, where BUFSIZE >= 4096. That is your solution does not process a single character at a time, which is easier to program, but not as efficient.
  • Support reading characters from standard input (stdin).
  • Colorize the output like real grep on Raspbian Linux.
  • Support the -i "ignore case" command line option.
  • Support regular expressions for the <string> argument (see man re_format).

Test Files (see three test files below).

Put these into a directory, then type:

$ usfgrep USF *.txt
a.txt:1:USFaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 
a.txt:6:aaaaaaaaaaaaaaaaaaUSFaaaaaaaaaaaaaaaaaaa 
a.txt:10:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaUSF 
c.txt:5:cccccccccccccccccccccccccccccccccccccUSF


Also, try the taots.txt (The Adventures of Tom Sawyer Book below):

$ usfgrep candle-grease taots.txt 
7609:candle-grease, smeared with clay, and almost worn out. He found Huck 
8406:?Lookyhere, Huck, there?s footprints and some candle-grease on the clay 
8524:were covered with clay and candle-grease. Aunt Polly blushed crimson

Rubric

  • (10 points) Clone GitHub Repo on RPi
  • (20 points) Compile and run usfgrep on RPi
  • (20 points) Handle single file as input and correctly locate substring on simple data. Correct output.
  • (20 points) Handle multiple files as input. Correct output
  • (10 points) Test with taots.txt
  • (20 points) Code quality: consistent formatting, consistent variable and function name formats, no extra vertical space, no commented out code, no redundant code, no extra long functions (including main), comments for tricky code.

ċ
a.txt
(0k)
Greg Benson,
Aug 24, 2017, 2:17 PM
ċ
b.txt
(0k)
Greg Benson,
Aug 24, 2017, 2:17 PM
ċ
c.txt
(0k)
Greg Benson,
Aug 24, 2017, 2:16 PM
ċ
taots.txt
(412k)
Greg Benson,
Aug 24, 2017, 2:16 PM
Comments