All source listed below is under MIT license if no LICENSE file stating different is available.

Isspam

Fast as light evaluator for text files to summarize specific details about the text files.

This repository contains two versions of the same algorithm.

Versions:

  • Rust (risspam) written by 12bitfloat.
  • C (isspam) written by retoor.

Building

make build

Build isspam with memory check (requires valgrind to be installed):

make valgrind

Running

Using files as parameter

./(r)isspam ./spam/*.txt
./(r)isspam ./not_spam/*.txt

Using stdin

Useful for automation. Works only on the isspam version.

cat ./spam/example_spam1.txt | ./isspam

Example output

Output example made by isspam.

File: ./spam/example_spam3.txt
Capitalized words: 39
Sentences: 20
Words: 420
Numbers: 1
Forbidden words: 15
<0:recovery>
<1:techie>
<2:https>
<3:digital>
<4:hack>
<5://>
<6:com>
<7:@>
<8:crypto>
<9:bitcoin>
<10:whatsapp>
<11:cryptocurrency>
<12:stolen>
<13:contact>
<14:understanding>
Word count per sentence: 21
Memory usage: 1 MB, 6.460 (re)allocated, 4.222 unqiue free'd, 0 in use.

Valgrind status

Valgrind output for isspam version.

Rust variant thinks it's too cool for memory checks afterwards.

Date: 2024-11-30

==58062== 
==58062== HEAP SUMMARY:
==58062==     in use at exit: 0 bytes in 0 blocks
==58062==   total heap usage: 6,490 allocs, 6,490 frees, 2,343,156 bytes allocated
==58062== 
==58062== All heap blocks were freed -- no leaks are possible
==58062== 
==58062== For lists of detected and suppressed errors, rerun with: -s
==58062== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
.gitea/workflows
12bitfloat_rust
not_spam
retoor_c
spam
.clang-format
.gitignore
bench.py
books.tar.gz
Makefile
README.md