Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

JUDGERAIL:HARNESSINGOPEN-SOURCELLMSFOR FASTHARMFUL TEXTDETECTIONWITH JUDICIAL PROMPTINGANDLOGITRECTIFICATION

Published in ICLR 2025(Reject), 2024

Largelanguagemodels(LLMs)simultaneouslyfacilitatethegenerationandde tectionofharmful text. LeadingLLMdevelopers, suchasOpenAI,Meta, and Google, aredrivingaparadigmshift in thedetectionofharmful text,moving fromconventionaldetectorstofine-tunedLLMs.However, thesenewlyreleased models,whichrequiresubstantialcomputationalanddataresources,havenotyet beenthoroughlyinvestigatedfortheireffectivenessinthisnewparadigm. Inthis work,weproposeJudgeRail, anovelandgenericframeworkthatguidesopen sourceLLMstoadheretojudicialprinciplesduringtextmoderation.Additionally, weintroduceanewlogit rectificationmethodthatcanextractanLLM’sclassi ficationintent, effectivelycontrols itsoutput format, andacceleratesdetection. Byintegratingseveral top-performingopen-sourceLLMs intoJudgeRailwith out anyfine-tuningandevaluatingthemagainstOpenAIModerationAPI,Lla maGuard3, ShieldGemma, andother conventionalmoderationsolutionsacross variousdatasets,includingthosespecificallydesignedforjailbreakingLLMs,we demonstrate that JudgeRail canadapt theseLLMs tobecompetitivewithfine tunedmoderationmodels andsignificantlyoutperformconventional solutions. Moreover,weevaluateallmodelsfordetectionlatency, acriticalyet rarelyex aminedpracticalaspect,andshowthatLLMswithJudgeRailrequireonly46%to 55%ofthetimeneededbyLlamaGuard3andShieldGemma.Thegenericnature andcompetitiveperformanceofJudgeRailhighlight itspotential forpromoting thepracticalityofLLM-basedharmful textdetectors.Warning: sometextex amplespresentedinthispapermaybeoffensivetosomereaders.

Download Paper

Paper Title Number 4

Published in Course assignment, 2025

This article is an assignment for the course “History of Computer Science Thought 2025” at Zhejiang University

Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.